git-annex

Author	SHA1	Message	Date
Joey Hess	ec12537774	defer write permissions checking in import until after copy to repo This should complete the fix started in `6329997ac4`, fixing the actual cause of the test suite failure this time. Sponsored-by: Dartmouth College's Datalad project	2021-09-02 13:45:21 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	837116ef1e	Fix support for readonly git remotes Boolean blindness oops. (Reversion in version 8.20210621) Sponsored-by: Dartmouth College's Datalad project	2021-08-30 12:34:19 -04:00
Joey Hess	a99a84f342	add: Detect when xattrs or perhaps ACLs prevent locking down a file's content And fail with an informative message. I don't think ACLs can prevent removing the write bit, but I'm not sure, so kept it mentioning them as a possibility. Should git-annex lock also check if the write bits are able to be removed? Maybe, but the case I know about with xattrs involves cp -a copying NFS xattrs, and it's the copy of the file that is the problem. So when locking a file, I guess it will not be the copy. Sponsored-by: Dartmouth College's Datalad project	2021-08-27 14:33:01 -04:00
Joey Hess	e17342b2a0	Run cp -a with --no-preserve=xattr, to avoid problems with copied xattrs Including them breaking permissions setting on some NFS servers. Sponsored-by: Dartmouth College's Datalad project	2021-08-27 13:09:34 -04:00
Joey Hess	6d4a728455	Added annex.youtube-dl-command config This can be used to run some forks of youtube-dl. Sponsored-by: Brett Eisenberg on Patreon	2021-08-27 09:44:23 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	4ed36b2634	Fix test suite failure on Windows It would be better if the Arbitrary instance avoided generating impossible filenames like "foo/c:bar", but proably this is the only place that splits the file from the directory and then uses the file without the directory.. At least on the quickcheck properties. Sponsored-by: Svenne Krap on Patreon	2021-08-24 14:03:29 -04:00
Joey Hess	f9b92c81f6	unused: Skip the refs/annex/last-index ref that git-annex recently started creating This was unlikely to cause any problem, but it is unsightly to mention normally hidden refs, and it might have done a bit of unnecessary work to check that ref. Sponsored-by: Noam Kremen on Patreon	2021-08-24 12:58:14 -04:00
Joey Hess	53744e132d	incremental verification for gitlfs and httpalso And that should be all the special remotes supporting it on linux now, except for in the odd edge case here and there. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:17:10 -04:00
Joey Hess	f5e09a1dbe	incremental verification for S3 Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:07:00 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	8613770b06	incremental verify for webdav special remote Sponsored-by: Dartmouth College's DANDI project	2021-08-16 17:29:32 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	ec82299730	status update I was wrong about S3 supporting tailVerify.	2021-08-16 15:15:32 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	7eb3742e4b	incremental verify for chunked remotes Simply feed each chunk in turn to the incremental verifier. When resuming an interrupted retrieve, it does not do incremental verification. That would need to read the file, up to the resume point, and feed it to the incremental verifier. That seems easy to get wrong. Also it would mean extra work done before the transfer can start. Which would complicate displaying progress, and would perhaps not appear to the user as if it was resuming from where it left off. Instead, in that situation, return UnVerified, and let the verification be done in a separate pass. Granted, Annex.CopyFile does manage all that, but it's not complicated by dealing with chunks too. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:42:49 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	f1176f82a5	rsync special remote: Stop displaying rsync progress, and use git-annex's own progress display Reasons are same as in commit `cee14f147a`. (It was already done when using -J.) Sponsored-by: Mark Reidenbach on Patreon	2021-08-09 12:06:10 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00
Joey Hess	899983058f	add: When adding a dotfile, avoid treating its name as an extension.	2021-08-03 12:22:58 -04:00
Joey Hess	9cae7c5bbf	releasing package git-annex version 8.20210803	2021-08-03 12:20:45 -04:00
Joey Hess	b3c4579c79	work around strange auto-init bug git-annex get when run as the first git-annex command in a new repo did not populate unlocked files. (Reversion in version 8.20210621) I am not entirely happy with this, because I don't understand how `428c91606b` caused the problem in the first place, and I don't fully understand how skipping calling scanAnnexedFiles during autoinit avoids the problem. Kept the explicit call to scanAnnexedFiles during git-annex init, so that when reconcileStaged is expensive, it can be made to run then, rather than at some later point when the information is needed. Sponsored-by: Brock Spratlen on Patreon	2021-07-30 18:36:03 -04:00
Joey Hess	66089e97de	Fix a rounding bug in display of data sizes Eg, showImprecise 1 1.99 returned "1.1" rather than "2". The 9 rounded upward to 10, and that was wrongly used as the decimal, rather than carrying the 1. Sponsored-by: Jack Hill on Patreon	2021-07-30 09:56:04 -04:00
Joey Hess	d2aead67bd	fsck: Detect and correct stale or missing inode caches for object files An easy way to see this in action is to have an unlocked file, and touch the object file. While all code that compares inode caches for object files needs to be prepared for this kind of problem and fall back to verification, having fsck notice it and correct it is cheap (as long as fsck is being run anyway) and ensures that if it happens for some unusual reason, there's a way for the user to notice that it's happening. Not that, when annex.thin is in use, the earlier call to isUnmodified (and also potentially earlier calls to inAnnex in eg, verifyLocationLog) will fix up the same problem silently. That might prevent the warning being displayed, although probably it still will be, because the Database.Keys write of the InodeCache will be queued but will not have happened yet. I can't see a way to improve this, but it's not great. Sponsored-by: Dartmouth College's Datalad project	2021-07-29 14:06:42 -04:00
Joey Hess	73e0cbbb19	fix problem populating pointer files This is a result of an audit of every use of getInodeCaches, to find places that misbehave when the annex object is not in the inode cache, despite pointer files for the same key being in the inode cache. Unfortunately, that is the case for objects that were in v7 repos that upgraded to v8. Added a note about this gotcha to getInodeCaches. Database.Keys.reconcileStaged, then annex.thin is set, would fail to populate pointer files in this situation. Changed it to check if the annex object is unmodified the same way inAnnex does, falling back to a checksum if the inode cache is not recorded. Sponsored-by: Dartmouth College's Datalad project	2021-07-27 14:26:49 -04:00
Joey Hess	3b5a3e168d	check if object is modified before starting to send it Fix bug that caused some transfers to incorrectly fail with "content changed while it was being sent", when the content was not changed. While I don't know how to reproduce the problem that several people reported, it is presumably due to the inode cache somehow being stale. So check isUnmodified', and if it's not modified, include the file's current inode cache in the set to accept, when checking for modification after the transfer. That seems like the right thing to do for another reason: The failure says the file changed while it was being sent, but if the object file was changed before the transfer started, that's wrong. So it needs to check before allowing the transfer at all if the file is modified. (Other calls to sameInodeCache or elemInodeCaches, when operating on inode caches from the database, could also be problimatic if the inode cache is somehow getting stale. This does not address such problems.) Sponsored-by: Dartmouth College's Datalad project	2021-07-26 17:33:49 -04:00
Joey Hess	3d50b47ded	sync, merge: Added --allow-unrelated-histories option Which is the same as the git merge option. After last commit, this turns out to be needed in the test suite, and when doing git-annex import from special remote, followed by a git-annex merge. Sponsored-by: Svenne Krap on Patreon	2021-07-19 12:14:26 -04:00
Joey Hess	b6bea0d3f2	remove direct mode remnant of merging unrelated histories sync, merge, post-receive: Avoid merging unrelated histories, which used to be allowed only to support direct mode repositories. (However, sync does still merge unrelated histories when importing trees from special remotes, and the assistant still merges unrelated histories always.) See `556b2ded2b` for why this was added back in 2016, for direct mode. This is a behavior change, which might break something that was relying on sync merging unrelated histories, but git had a good reason to prevent it, since it's easy to foot shoot with it, and git-annex should follow suit. Sponsored-by: Noam Kremen on Patreon	2021-07-19 11:41:26 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	c952c485c8	Fix retrieval of content from borg repos accessed over ssh It was making the borgrepo path absolute.. even when it was a ssh repository. Made BorgRepo a newtype, to guard against accidentially treating it like a FilePath. Sponsored-by: Graham Spencer on Patreon	2021-07-15 12:39:24 -04:00
Joey Hess	dd31fe7b9e	fall back to checking lower case hash directories in normal repo Fix a bug that prevented getting content from a repository that started out as a bare repository, or had annex.crippledfilesystem set, and was converted to a non-bare repository. This unfortunately means that inAnnex check gets slowed down by a stat call in normal repos when the content is not present. Oh well, such is the cost of backwards compatability with old mistakes. Sponsored-by: Mark Reidenbach on Patreon	2021-07-15 12:16:31 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	065db484e0	releasing package git-annex version 8.20210714	2021-07-14 12:23:24 -04:00
Joey Hess	8885bd3c5b	addistant: honor annex.delayadd for non-large files assistant: When adding non-large files to git, honor annex.delayadd configuration. Also, don't add non-large files to git when they are still being written to. This came for free, since the changes to non-large files get queued up with the ones to large files, and run through the lsof check. Sponsored-by: Luke Shumaker on Patreon	2021-07-13 12:17:00 -04:00
Joey Hess	a6767ca81f	close bug and mention another aspect of the reversion in changelog	2021-07-12 10:45:57 -04:00
Joey Hess	6a581f8b8b	fix init reversion when core.sharedRepository = group init: Fix misbehavior when core.sharedRepository = group that caused it to enter an adjusted branch. (Reversion in version 8.20210630) Commit `4b1b9d7a83` made init call freezeContent in case there was a hook that could prevent writing in situations where perms don't. But with the above git config, freezeContent does not prevent write at all. So init needs to do what freezeContent does with a non-shared git config. Or init could check for that config, and skip the probing, since it won't actually be preventing write to any files. But that would make init too aware if details of Annex.Perms, and also would break if the git config were changed after init. Sponsored-by: Dartmouth College's Datalad project	2021-07-12 10:15:49 -04:00
Joey Hess	b885007f0e	--debug output goes to stderr again, not stdout Reversion in version 8.20210428 Sponsored-by: Dartmouth College's Datalad project	2021-07-12 09:40:38 -04:00
Joey Hess	b9db859221	addurl: Avoid crashing when used on beegfs. Sponsored-by: Dartmouth College's DANDI project	2021-07-05 13:02:40 -04:00
Joey Hess	d2c48404a8	assistant: Avoid unncessary git repository repair In a situation where git fsck gets confused about a commit that is made while it's running. Sponsored-by: Graham Spencer on Patreon	2021-06-30 18:00:16 -04:00
Joey Hess	fd99ce6c95	releasing package git-annex version 8.20210630	2021-06-30 11:48:33 -04:00
Joey Hess	73ccf34763	closing	2021-06-30 11:47:39 -04:00
Joey Hess	6b0d732746	repair: Fix reversion in version 8.20200522 that prevented fetching missing objects from remotes In commit `dfc4e641b5` git repair was changed to use remote name, not url, when fetching. But it fetches into a temporary git repo, which doesn't have remotes configured. Oops. (In my defense, that commit was made just as covid lockdown started. But testing? Urk.) Sponsored-by: Mark Reidenbach on Patreon	2021-06-29 13:15:15 -04:00
Joey Hess	199391befe	make repair interruption safe Fixed bug that interrupting git-annex repair (or assistant) while it was fixing repository corruption would lose objects that were contained in pack files. Unpack all pack files and move objects into place before deleting the pack files. The old approach moved the pack files to a temp directory before unpacking them, which was not interruption safe. Sponsored-By: Jochen Bartl on Patreon	2021-06-29 13:14:28 -04:00
Joey Hess	b8e32e200e	addurl, importfeed: Added --no-raw option Forces eg, download with youtube-dl without falling back to raw download. Since youtube-dl failing due to an url not being supported is difficult to distinguish from it failing due to being blocked in some way, this can be useful to avoid the fallback of git-annex downloading the raw web page and adding that. Since --raw also prevents using special remotes, --no-raw also allows special remote downloads. Although it's always possible that some special remote may claim an url and fall back to raw download of the content, which --no-raw cannot prevent. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-06-27 11:14:51 -04:00
Joey Hess	3a14648142	dropping unused marks as dead Dropping an object with drop --unused or dropunused will mark it as dead, preventing fsck --all from complaining about it after it's been dropped from all repositories. If another repository still has a copy, it won't be treated as dead until it's also dropped from there. The drop has to use --unused, can't be --key or something else, because this indicates that the user has recently ran git-annex unused. If it checked the unused log on every drop, bad things would happen when the unused log was out of date, eg a file used to be unused but then got re-added. Marking such a file as dead could be confusing. When the user uses --unused/dropunused, they must consider the unused information to be up-to-date. The particular workflow this enables is: git annex add foo git annex unannex foo git annex unused git annex drop --unused / dropunused git annex fsck --all # no warnings The docs for git-annex unannex say to use git-annex unused and dropunused, so the user should be pointed in this direction when they want to undo an accidental add. Sponsored-by: Brock Spratlen on Patreon	2021-06-25 15:22:26 -04:00
Joey Hess	df2001aa88	Improve display of errors when transfers fail Transfers from or to a local git repo could fail without a reason being given, if the content failed to verify, or if the object file's stat changed while it was being copied. Now display messages in these cases. Sponsored-by: Jack Hill on Patreon	2021-06-25 13:17:04 -04:00
Joey Hess	4b1b9d7a83	Added annex.freezecontent-command and annex.thawcontent-command configs Freeze first sets the file perms, and then runs freezecontent-command. Thaw runs thawcontent-command before restoring file permissions. This is in case the freeze command prevents changing file perms, as eg setting a file immutable does. Also, changing file perms tends to mess up previously set ACLs. git-annex init's probe for crippled filesystem uses them, so if file perms don't work, but freezecontent-command manages to prevent write to a file, it won't treat the filesystem as crippled. When the the filesystem has been probed as crippled, the hooks are not used, because there seems to be no point then; git-annex won't be relying on locking annex objects down. Also, this avoids them being run when the file perms have not been changed, in case they somehow rely on git-annex's setting of the file perms in order to work. Sponsored-by: Dartmouth College's Datalad project	2021-06-21 14:40:52 -04:00
Joey Hess	1cc7b2661e	push synced/master before synced/git-annex sync: Partly work around github behavior that first branch to be pushed to a new repository is assumed to be the head branch, by not pushing synced/git-annex first. github expects master (or whatever the name is) to be pushed first, but git-annex sync can't, because it's got to also support pushes to non-bare repos where pushing master fails, as explained in the big comment. So pushing synced/master is not entirely a fix, but at least it makes github default to a branch with the stuff the user expects in it, not a bunch of annex log files. Aside from fixing github to not make this assumption, or improving the git push protocol to include what the current HEAD is, the only other approach I can think of is to identify git push's progress messages and display those when pushing master, while filtering out error messages about non-fast-forward etc. But git doesn't provide a way to separate out or identify its progress messages. Sponsored-by: Luke Shumaker on Patreon	2021-06-21 12:32:21 -04:00
Joey Hess	a6e281e008	releasing package git-annex version 8.20210621	2021-06-21 12:17:46 -04:00
Joey Hess	d2be68907c	drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies Eg, before with a .gitattributes like: .2 annex.numcopies=2 .1 annex.numcopies=1 And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2 would succeed, leaving just 1 copy, despite foo.2 needing 2 copies. It dropped foo.1 first and then skipped foo.2 since its content was gone. Now that the keys database includes locked files, this longstanding wart can be fixed. Sponsored-by: Noam Kremen on Patreon	2021-06-15 11:38:44 -04:00
Joey Hess	af9fdf5dba	verify associated files when checking numcopies Most of this is just refactoring. But, handleDropsFrom did not verify that associated files from the keys db were still accurate, and has now been fixed to. A minor improvement to this would be to avoid calling catKeyFile twice on the same file, when getting the numcopies and mincopies value, in the common case where the same file has the highest value for both. But, it avoids checking every associated file, so it will scale well to lots of dups already. Sponsored-by: Kevin Mueller on Patreon	2021-06-15 11:14:52 -04:00
Joey Hess	3af4c9a29a	fix exponential blowup when adding lots of identical files This was an old problem when the files were being added unlocked, so the changelog mentions that being fixed. However, recently it's also affected locked files. The fix for locked files is kind of stupidly simple. moveAnnex already handles populating unlocked files, and only does it when the object file was not already present. So remove the redundant populateUnlockedFiles call. (That call was added all the way back in `cfaac52b88`, and has always been unncessary.) Sponsored-by: Dartmouth College's Datalad project	2021-06-15 09:45:55 -04:00
Joey Hess	78da00c7a6	Future proof activity log parsing When the log has an activity that is not known, eg added by a future version of git-annex, it used to be treated as no activity at all, which would make git-annex expire think it should expire the repository, despite it having some kind of recent activity. Hopefully there will be no reason to add a new activity until enough time has passed that this commit is in use everywhere. Sponsored-by: Jake Vosloo on Patreon	2021-06-14 14:18:19 -04:00
Joey Hess	771a122c9e	add --size-limit option When this option is not used, there should be effectively no added overhead, thanks to the optimisation in `b3cd0cc6ba`. When an action fails on a file, the size of the file still counts toward the size limit. This was necessary to support concurrency, but also generally seems like the right choice. Most commands that operate on annexed files support the option. export and import do not, and I don't know if it would make sense for export to.. Why would you want an incomplete export? sync doesn't, and while it would be easy to make it support it for transferring files, it's not clear if dropping files should also take the size limit into account. Commands like add that don't operate on annexed files don't support the option either. Exiting 101 not yet implemented. Sponsored-by: Denis Dzyubenko on Patreon	2021-06-04 16:16:53 -04:00
Joey Hess	189fb05ffb	Added annex.adviceNoSshCaching config. Sponsored-by: Brock Spratlen on Patreon	2021-05-27 12:37:49 -04:00
Joey Hess	b5f5475ed6	New matching options --excludesamecontent and --includesamecontent The normalisation of filenames turns out to be the tricky part here, because the associated files coming out of the keys db may look like "./foo/bar" or "../bar". For the former to match a glob like "foo/", it needs to be normalised. Note that, on windows, normalise "./foo/bar" = "foo\\bar" which a glob like "foo/" won't match. So the glob is matched a second time, on the toInternalGitPath, so allowing the user to provide a glob with the slashes in either direction. However, this still won't support some wacky edge cases like the user providing a glob of "foo/bar\\*" Sponsored-by: Dartmouth College's Datalad project	2021-05-25 13:08:18 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	7029ef1c3d	improve changelog	2021-05-25 10:08:29 -04:00
Joey Hess	5d18994736	clearer language	2021-05-24 14:54:51 -04:00
Joey Hess	a56b151f90	fix longstanding indeterminite preferred content for duplicated file problem * drop: When two files have the same content, and a preferred content expression matches one but not the other, do not drop the file. * sync --content, assistant: Fix an edge case where a file that is not preferred content did not get dropped. The sync --content edge case is that handleDropsFrom loaded associated files and used them without verifying that the information from the database was not stale. It seemed best to avoid changing --want-drop's behavior, this way when debugging a preferred content expression with it, the files matched will still reflect the expression. So added a note to the --want-drop documentation, to make clear it may not behave identically to git-annex drop --auto. While it would be possible to introspect the preferred content expression to see if it matches on filenames, and only look up the associated files when it does, it's generally fairly rare for 2 files to have the same content, and the database lookup is already avoided when there's only 1 file, so I did not implement that further optimisation. Note that there are still some situations where the associated files database does not get locked files recorded in it, which will prevent this fix from working. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:07:05 -04:00
Joey Hess	c525d18cf7	filter-branch: New command, useful to produce a filtered version of the git-annex branch, eg when splitting a repository	2021-05-17 14:16:46 -04:00
Joey Hess	8b6dad11a2	add createMessage init: When annex.commitmessage is set, use that message for the commit that creates the git-annex branch. This will be used by filter-branch too, and it seems to make sense to let annex.commitmessage affect it.	2021-05-17 13:07:47 -04:00
Joey Hess	947d2a10bc	assistant: Fix a crash on startup by avoiding using forkProcess ghc 8.8.4 seems to have changed something that broke code that has been successfully using forkProcess since 2012. Likely a change to GC internals. Since forkProcess has never had clear documentation about how to use it safely, avoid using it at all. Instead, when git-annex needs to daemonize itself, re-run the git-annex command, in a new process group and session. This commit was sponsored by Luke Shumaker on Patreon.	2021-05-12 15:08:03 -04:00
Joey Hess	675556fd9a	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache. git-annex add was changed, when adding a small file, to remove the inode cache for it. This is necessary to keep the recipe in doc/tips/largefiles.mdwn for converting from annex to git working. It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn which the earlier try at this change introduced.	2021-05-10 13:20:10 -04:00
Joey Hess	72a8bbce12	Revert "smudge: check for known annexed inodes before checking annex.largefiles" This reverts commit `424bef6b6f`. This commit caused other buggy behavior unfortunately.	2021-05-10 12:20:13 -04:00
Joey Hess	921753ac44	reinject: Error out when run on a file that is not annexed rather than silently skipping it	2021-05-07 13:31:03 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	424bef6b6f	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache.	2021-05-03 13:26:32 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	27e5f3cd52	releasing package git-annex version 8.20210428	2021-04-28 12:16:45 -04:00
Joey Hess	0f73b6d03a	Avoid more than 1 gpg password prompt at the same time Which could happen occasionally before when concurrency is enabled. While not much of a problem when it did happen, better to avoid it. Also, since it seems likely the gpg-agent sometimes fails in such a situation, this makes it not happen when running a single git-annex command with concurrency enabled. This commit was sponsored by Jake Vosloo on Patreon.	2021-04-27 16:36:44 -04:00
Joey Hess	a166d2520b	check mincopies is satisfied even when numcopies is known to be satisfied I had been assuming that numcopies would be a larger or at most equal to mincopies, so no need to check both. But users get confused and use configs that don't really make sense, so make sure to handle mincopies being larger than numcopies. Also add something to the mincopies man page to discourage this misconfiguration. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-04-27 13:37:18 -04:00
Joey Hess	d3e49b210a	git-annex-config: Allow setting annex.securehashesonly Which has otherwise been supported since 2019, but was missing from the list of allowed repo-global configs. Reordered the list to match the order in the git-annex-config man page, to make them easy to cross-compare. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-04-26 13:50:37 -04:00
Joey Hess	8e24fb3507	update	2021-04-26 13:12:51 -04:00
Joey Hess	2b264b3edf	initremote --private	2021-04-23 14:47:46 -04:00
Joey Hess	0547884eb2	importfeed: fix bug while also speeding up 12x! * Fix bug that could make git-annex importfeed not see recently recorded state when configured with annex.alwayscommit=false. * importfeed: Made "checking known urls" phase run 12 times faster. The massive speedup is because it no longer queries for metadata accompanying each url. Instead it processes the whole git-annex branch and checks all metadata files for feed item ids, and uses any it finds. This could result in a behavior change, in an unlikely situation: If a feed id is recorded in a key's metadata, but the url gets removed, the old code would not see that item id and would re-download it if it finds an url for it in a feed, while the new code will see the item id. I don't think the old behavior was intentional, and it may be that the new behavior is better. Not gonna worry about this.	2021-04-23 12:36:56 -04:00
Joey Hess	6eb3c0a6b4	fix branch precacheing bug by checking journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. When not using --all, precaching only gets triggered when the command actually needs location logs, and so there's no speed hit there. This is a minor speed hit for --all, because it precaches even when the location log is not actually going to be used, and so checking the journal is not necessary. It would have been possible to defer checking the journal until the cache gets used. But that would complicate the usual Branch.get code path with two different kinds of caches, and the speed hit is really minimal. A better way to speed up --all, later, would be to avoid precaching at all when the location log is not going to be used.	2021-04-21 14:02:15 -04:00
Joey Hess	e1a9b79fa6	fix hardcoded origin name in checkAdjustedClone init: Fix a crash when the repo's was cloned from a repo that had an adjusted branch checked out, and the origin remote is not named "origin". The only other hardcoding of the name of origin is in: - Upgrade.V2, which can be ignored probably - Annex.Branch, which doesn't fail if it has some other name, but just doesn't set up the git-annex branch with quite as linear a history in that case.	2021-04-14 18:53:27 -04:00
Joey Hess	4b048ca042	directory CoW on store Not for exports to directory yet though.	2021-04-14 15:11:00 -04:00
Joey Hess	7bb93896af	directory CoW on retrieve directory: When cp supports reflinks, use it when getting content from a directory special remote. Not yet for imports from directory though, and not for store. Note that, when it's chunked, using cp --reflink would not speed it up, and when reflink was not supported, would unnecessarily write the chunk to a file before reading it back in. So, only using a fileRetriever in the NoChunks case is necessary to keep chunking fast. fileCopier is told not to verify, because the special remote interface does not yet support verification in passing. AFAICS, fileCopies can never return False when not verifying so the added giveup should never actually happen.	2021-04-14 15:05:12 -04:00
Joey Hess	5783a8d081	fsck: avoid redundant checksum when transfer is Verified When downloading content from a remote, if the content is able to be verified during the transfer, skip checksumming it a second time. Note that in this case, the fsck output does not include "(checksum)" which it does when the checksumming is done separately from the download. This commit was sponsored by Brock Spratlen on Patreon.	2021-04-14 13:22:54 -04:00
Joey Hess	8e7dc958d2	forget: Preserve currently exported trees Avoiding problems with exporttree remotes in some unusual circumstances. This commit was sponsored by Brett Eisenberg on Patreon.	2021-04-13 15:00:23 -04:00
Joey Hess	805d325a8d	diffdriver: Support unlocked files	2021-04-08 14:32:09 -04:00
Joey Hess	1b645e1ace	added --debugfilter (and annex.debugfilter)	2021-04-05 15:31:10 -04:00
Joey Hess	ced91b3fbd	Avoid excess commits to the git-annex branch when stall detection is enabled When git-annex transferrer started up, and the journal contained something, it would commit it to the git-annex branch. This caused excess commits to the branch, in cases where normally several changes would be journalled and committed together. That generated some excess git objects and was also just noisy on stdout. Since transferrer uses enableInteractiveBranchAccess, it does not need to commit journalled changes, since the optimisation that avoids checking the journal when reading from the branch is disabled for processes that call that. This commit was sponsored by Svenne Krap on Patreon.	2021-04-02 11:57:18 -04:00
Joey Hess	8868a3a4c7	Fix build with persistent-2.12.0.1 persistent stopped using askLogFunc, and the thing to use is askLoggerIO from monad-logger. Bumped the dep to the first version that contained that. Note that the i386ancient build uses a newer monad-logger than 0.3.10, so the new versioned dep should not break it, and presumably nothing else either. This commit was sponsored by Noam Kremen on Patreon.	2021-04-01 12:21:02 -04:00
Joey Hess	315a81e3c6	releasing package git-annex version 8.20210330	2021-03-30 14:33:28 -04:00
Joey Hess	4611813ef1	Fix bug importing from a special remote into a subdirectory more than one level deep Which generated unusual git trees that could confuse git merge, since they incorrectly had 2 subtrees with the same name. Root of the bug was a) not testing that at all! but also b) confusing graftdirs, which contains eg "foo/bar" with non-recursively read trees, which would contain eg "bar" when reading a subtree of "foo". It's worth noting that Annex.Import uses graftTree, but it really shouldn't have needed to. Eg, when importing into foo/bar from a remote, it's enough to generate a tree of foo/bar/x, foo/bar/y, and does not include other files that are at the top of the master branch. It uses graftTree, so it does include the other files, as well as the foo/bar tree. git merge will do the same thing for both trees. With that said, switching it away from graftTree would result in another import generating a new commit that seems to delete files that were there in a previous commit, so it probably has to keep using graftTree since it used it before. This commit was sponsored by Kevin Mueller on Patreon.	2021-03-26 16:04:36 -04:00
Joey Hess	f085ae4937	borg: Support importing files that are hard linked in the borg backup Note that a key with no size field that is hard linked will result in listImportableContents reporting a file size of 0, rather than the actual size of the file. One result is that the progress meter when getting the file will seem to get stuck at 100%. Another is that the remote's preferred content expression, if it tries to match against file size, will treat it as an empty file. I don't see a way to improve the latter behavior, and the former behavior is a minor enough problem. This commit was sponsored by Jake Vosloo on Patreon.	2021-03-26 13:29:34 -04:00
Joey Hess	31eb5fddf3	borg: Fix a bug that prevented importing keys of type URL and WORM Keys stored on the filesystem are mangled by keyFile to avoid problem chars. So, that mangling has to be reversed when parsing files from a borg backup back to a key. The directory special remote also so mangles them. Some other special remotes do not; eg S3 just serializes the key -- but S3 object names are not limited to filesystem valid filenames anyway, so a S3 server must not map them directly to files in any case. It seems unlikely that a borg backup of some such special remote will get broken by this change. This commit was sponsored by Graham Spencer on Patreon.	2021-03-26 12:07:00 -04:00
Joey Hess	537f9d9a11	Improved display of errors when accessing a git http remote fails. New error message: Remote foo not usable by git-annex; setting annex-ignore http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1 If git config parse fails, or the git config file is not available at the url, a better error message for that is also shown. This commit was sponsored by Mark Reidenbach on Patreon.	2021-03-24 14:19:32 -04:00
Joey Hess	4631d1ab56	Fix build with attoparsec-0.14 It changed parseOnly in the ByteString.Lazy module to take a lazy, not strict ByteString. In all these cases though, we actually had a strict ByteString, so the most efficient fix, which also happens to avoid needing ifdefs, is to use the non-lazy module instead. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-03-24 12:11:50 -04:00
Joey Hess	5d78cd9d08	Sped up git-annex init in a clone of an existing repository Seems that hasOrigin was never finding origin's git-annex branch, so a new one got created each time. And so then it later needed to merge the two branches, which is expensive. Added --no-track to git branch to avoid it displaying a message about setting up tracking branches. Of course there's no reason to make the git-annex branch a tracking branch since git-annex auto-merges it.	2021-03-23 15:23:13 -04:00
Joey Hess	798f685077	New annex.supportunlocked config Can beet to false to avoid some expensive things needed to support unlocked files. See my comment for why this only controls what init sets up, and not other behavior. I didn't bother with making the v5 upgrade code path look at this, though it easily could, because the docs say to run git-annex init after setting it to make it take effect.	2021-03-23 14:04:34 -04:00
Joey Hess	c68ba7d893	whereis: Don't include yt: prefix when showing url to content retrieved with youtube-dl I don't think this was really intentional behavior. It may be that it was useful to include it so it could be passed to rmurl, since without it rmurl would not actually remove the url. Since that was changed earlier today, now seems like a good time to clean up the display of these urls. This commit was sponsored by Jochen Bartl on Patreon.	2021-03-22 19:56:24 -04:00
Joey Hess	637229c593	fix fsck --from --all to not fall over trying to check required content fsck: When --from is used in combination with --all or similar options, do not verify required content, which can't be checked properly when operating on keys. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-03-22 15:08:07 -04:00
Joey Hess	5545e78a1e	Make --debug also enable debugging in child git-annex processes Especially necessary with stalldetection using child processes for transfers. This commit was sponsored by Jack Hill on Patreon.	2021-03-22 14:25:28 -04:00
Joey Hess	5d75cbcdcf	webdav: deal with buggy webdav servers in renameExport box.com already had a special case, since its renaming was known buggy. In its case, renaming to the temp file succeeds, but then renaming the temp file to final destination fails. Then this 4shared server has buggy handling of renames across directories. While already worked around with for the temp files when storing exports now being in the same directory as the final filename, that also affected renameExport when the file moves between directories. I'm not entirely clear what happens on the 4shared server when it fails this way. It kind of looks like it may rename the file to destination and then still fail. To handle both, when rename fails, delete both the source and the destination, and fall back to uploading the content again. In the box.com case, the temp file is the source, and deleting it makes sure the temp file gets cleaned up. In the 4shared case, the file may have been renamed to the destination and so cleaning that up avoids any interference with the re-upload to the destination.	2021-03-22 13:08:18 -04:00
Joey Hess	0af9d1dcb6	unregisterurl: remove all forms of an url, no matter what the downloader is set to unregisterurl: Fix a bug that caused an url to not be unregistered when it is claimed by a special remote other than the web. See commit `f175d4cc90` for rationalle.	2021-03-22 12:17:17 -04:00
Joey Hess	f175d4cc90	rmurl: remove all forms of an url, no matter what the downloader is set to * rmurl: When youtube-dl was used for an url, it no longer needs to be prefixed with "yt:" in order to be removed. * rmurl: If an url is both used by the web and also claimed by another special remote, fix a bug that caused the url to to not be removed. The youtube-dl change is a consequence of how the bug fix is implemented. But I also think it's the right thing to do. Consider that, before, git-annex addurl $url followed by git-annex rmurl $url would not remove the url in the case where youtube-dl was used. That was surprising behavior. In the unlikely case where a special remote claims an url, and it's been added using OtherDownloader, but it was also added already as a web url, it seems better for rmurl to remove both than to arbitrarily remove only one. And in the case the bug report was filed for, when an url was added as a web url, but a special remote now claims it, that should not prevent rmurl removing the web url. Calling setUrlMissing lets other callers of it behave differently. Probably the calls to it in eg, Remote.External and Remote.BitTorrent are fine, since they don't mangle the url and just remove what was provided, and the OtherDownloader form of a bittorrent url, respectively. I suspect unregisterurl needs to have a similar change made to rmurl, for similar reasons.	2021-03-22 12:09:15 -04:00
Joey Hess	9856e10d3c	call out behavior change	2021-03-22 11:34:23 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	6481991208	export --json: Fill in the file field Like import was using ActionItemWorkTreeFile, it's ok to use it for export, even though it might not correspond with a file in the work tree. And renamed it to ActionItemTreeFile to make that clearer. Note that when an export has to rename files, it still uses ActionItemOther, so file will still be null in that case, but as no file is being transferred, that seems ok.	2021-03-12 14:11:31 -04:00
Joey Hess	1cb154f457	avoid importing deleting submodule import: When the previously exported tree contained a submodule, preserve it in the imported tree so it does not get deleted. The export exclude log, which was used for non-preferred content, now also includes the submodules. Since the log format is git ls-tree output, this does not break backwards compatibility.	2021-03-12 13:31:21 -04:00
Joey Hess	f2a425bd92	export: When a submodule is in the tree to be exported, skip it.	2021-03-12 12:29:18 -04:00
Joey Hess	a343ea76c8	releasing package git-annex version 8.20210310	2021-03-10 13:59:00 -04:00
Joey Hess	3bf789c68f	git on OSX dmg updated to fix CVE This mostly affects OSX and (possibly) Windows, but the Windows installer does not bundle git. The linux standalone builds are not updated yet pending debian stable getting a backport of the security fix, but the security hole is unlikely to affect linux as case-insensitive filesystems that support symlinks are a rarity on it. Using the linux standalone build on windows via WSL is another way it could be affected. This commit was sponsored by Brett Eisenberg on Patreon.	2021-03-10 13:53:11 -04:00
Joey Hess	60be1a7864	reorder	2021-03-10 10:15:45 -04:00
Joey Hess	1d7fa63149	Added support for git-remote-gcrypt's rsync URIs Which access a remote using rsync over ssh, and which git pushes to much more efficiently than ssh urls. There was some old partial support for rsync URIs from 2013, but it seemed incomplete, and did not use rsync over ssh. Weird. I'm not sure if there's any remaining benefit to using the non-rsync url forms with gcrypt, now that this is implemented? Updated docs to encourage using the rsync urls. This commit was sponsored by Svenne Krap on Patreon.	2021-03-09 15:58:09 -04:00
Joey Hess	e07eabbf7f	Fix support for local gcrypt repositories with a space in their URI Git.Remote.parseRemoteLocation had a hack to handle URIs that contained characters like spaces, which is something git unfortunately allows despite not being a valid URI. However, that hack looked for "//" to guess something was an URI, and these gcrypt URIs, being to a local path, don't contain that. So instead escape all illegal characters and check if the resulting thing is an URI. And that was already done by Git.Construct.fromUrl, so internally the gcrypt URI with a space looks like "gcrypt::foo%20bar" and that needs to be de-escaped when converting back from URI to local repo path. This change might also allow a few other almost-valid URIs to be handled as URIs by git-annex. None that contain "//" will change, and any behavior change should result in git-annex doing closer to a right thing than it did before, probably. This commit was sponsored by Noam Kremen on Patreon.	2021-03-09 12:49:51 -04:00
Joey Hess	e8065ee99d	close todo	2021-03-05 14:46:09 -04:00
Joey Hess	a14001785e	fix --branch combined with --unlocked or --locked Since it's using git ls-tree anyway, can just look at the file modes to see if they're unlocked or are symlinks.	2021-03-02 13:47:27 -04:00
Joey Hess	25e4ab7e81	Prevent combinations of options such as --all with --include Previously such nonsensical combinations always treated the matching option as if it didn't match. For now, made find --branch refuse matching options that need a filename, because one is not provided to them in a way they'll use. There's an open bug report to support it, but making it error out is better than the old behavior of not finding what it was asked to. Also, made --mimetype combined with eg --all work, by looking at the object file when operating on keys.	2021-03-01 16:25:23 -04:00
Joey Hess	eb594c710e	unregisterurl: New command Implemented by generalizing registerurl. Without the implicit batch mode of registerurl since that is only a backwards compatability thing (see commit `1d1054faa6`).	2021-03-01 14:28:24 -04:00
Joey Hess	97ae474585	registerurl: Allow it to be used in a bare repository.	2021-03-01 14:03:03 -04:00
Joey Hess	a8b627d82b	uninit: Fix a small bug that left a lock file in .git/annex unannex using git queue caused the queue lock to be taken after uninit had cleaned out .git/annex. Flush the queue earlier to avoid.	2021-03-01 13:05:47 -04:00
Joey Hess	a942ed4bb9	Windows: Correct the path to the html help file for 64 bit build.	2021-02-24 13:19:42 -04:00
Joey Hess	d670346b22	releasing package git-annex version 8.20210223	2021-02-23 14:40:45 -04:00
Joey Hess	530e96b80e	fix unannex data overwrite bug unannex, uninit: When an annexed file is modified, don't overwrite the modified version with an older version from the annex This commit was sponsored by Mark Reidenbach on Patreon.	2021-02-22 13:35:00 -04:00
Joey Hess	62d5a73bdd	unannex, uninit: Avoid running git rm once per annexed file, for a large speedup.	2021-02-22 12:56:11 -04:00
Joey Hess	cddf2343b2	wording	2021-02-22 12:51:52 -04:00
Joey Hess	f44d4704c6	incremental checksum for local remotes This benchmarks only slightly faster than the old git-annex. Eg, for a 1 gb file, 14.56s vs 15.57s. (On a ram disk; there would certianly be more of an effect if the file was written to disk and didn't stay in cache.) Commenting out the updateIncremental calls make the same run in 6.31s. May be that overhead in the implementation, other than the actual checksumming, is slowing it down. Eg, MVar access. (I also tried using 10x larger chunks, which did not change the speed.)	2021-02-10 16:05:24 -04:00
Joey Hess	e24ddb8946	Bugfix: fsck --from a ssh remote did not actually check that the content on the remote is not corrupted Changing to the P2P protocol broke this, because preseedTmp copies the local copy of the object to the temp file, and then the P2P transfer sees the right length file and uses it as-is. When git-annex-shell is too old and rsync is used, it did verify the content, and when the local repo does not have the object it did verify the content.	2021-02-10 13:29:12 -04:00
Joey Hess	4b63e932f3	incremental checksum on upload to ssh or p2p	2021-02-10 12:41:05 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	fa3d71d924	Tahoe: Avoid verifying hash after download, since tahoe does sufficient verification itself See my comment in the next commit for some details about why Verified needs a hash with preimage resistance. As far as tahoe goes, it's fully cryptographically secure. I think that bup could also return Verified. However, the Retriever interface does not currenly support that.	2021-02-09 13:42:16 -04:00
Joey Hess	bb8db94655	reorder	2021-02-08 17:54:29 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	135757d64a	automatic stall detection annex.stalldetection can now be set to "true" to make git-annex do automatic stall detection when it detects a remote is updating its transfer progress consistently enough. This commit was sponsored by Luke Shumaker on Patreon.	2021-02-03 13:33:57 -04:00
Joey Hess	aec2cf0abe	addon commands Seems only fair, that, like git runs git-annex, git-annex runs git-annex-foo. Implementation relies on O.forwardOptions, so that any options are passed through to the addon program. Note that this includes options before the subcommand, eg: git-annex -cx=y foo Unfortunately, git-annex eats the --help/-h options. This is because it uses O.hsubparser, which injects that option into each subcommand. Seems like this should be possible to avoid somehow, to let commands display their own --help, instead of the dummy one git-annex displays. The two step searching mirrors how git works, it makes finding git-annex-foo fast when "git annex foo" is run, but will also support fuzzy matching, once findAllAddonCommands gets implemented. This commit was sponsored by Dr. Land Raider on Patreon.	2021-02-02 16:32:49 -04:00
Joey Hess	58216ef39d	Include libkqueue.h file needed to build the assistant on BSDs I suspect this is a bug in cabal sdist, because with Includes: Utility/libkqueue.h the file is not included, but putting it in extra-files does get it into the tarball.	2021-02-01 12:00:56 -04:00
Joey Hess	41bf440729	Fix build on openbsd. Thanks, James Cook for the patch.	2021-02-01 11:56:17 -04:00
Joey Hess	8d4eb2d34e	get: Improve output when failing to get a file fails showTriedRemotes lists the remotes it tried to access. So there's no need to list those again in "Try making some of these remotes available".	2021-01-29 15:11:19 -04:00
Joey Hess	c35fa6975b	fix handling of implicit and before parens Fix an oddity in matching options and preferred content expressions such as "foo (bar or baz)", which was incorrectly handled as if it were "(foo or bar) and baz)" rather than the intended "foo and (bar or baz)" Seemed like a change to consume should be able to handle this case better, but I was having trouble writing it that way, so instead added a separate pass that inserts the implicit ands explicitly. Also added several test cases to make sure versions with and without explicit ands generate the same.	2021-01-28 13:51:07 -04:00
Joey Hess	6f78497572	When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked Missed this when implementing it because of the default case catching the new constructor. So, removed that default case to make sure future types of adjusted branches don't make the same mistake. Complicated by git-annex addurl --fast which adds the file whose content is not present, so it needs to stay unlocked when on such a branch. This commit was sponsored by Brock Spratlen on Patreon.	2021-01-28 12:47:46 -04:00
Joey Hess	e3224ff77d	formatLsTree did not use a tab where git does Fixed that, and made parserLsTree accept the space as well as tab. Fixes a reversion that made import of a tree from a special remote result in a merge that deleted files that were not preferred content of that special remote.	2021-01-28 12:36:37 -04:00
Joey Hess	a82aca67b8	releasing package git-annex version 8.20210127	2021-01-27 11:13:25 -04:00
Joey Hess	b372d962ae	Added GETGITREMOTENAME to extenal special remote protocol	2021-01-26 12:42:47 -04:00
Joey Hess	03b0b61018	wording	2021-01-25 17:40:08 -04:00
Joey Hess	47338bf270	support modifying and running git add on an unlocked file that used an URL key Avoids the smudge --clean filter failing because URL keys do not support genKey. Instead the modified content will be added using the default backend. This commit was sponsored by Jochen Bartl on Patreon.	2021-01-25 17:37:16 -04:00
Joey Hess	34a535ebea	adjust: Fix some bad behavior when unlocked files use URL keys. This avoids the smudge --clean filter failing on the URL keys. git checkout runs the post-checkout hook, which runs smudge --update. That populates all the pointer files, but it neglected to store their inode caches in the keys db. With that done, and the keys db flushed before smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap check can tell the file is not modified, so will not try to re-ingest it, which does not work with URL keys because they do not support genKey. It also seems possible that the isUnmodifiedCheap was also failing for non-URL keys, which would cause them to be re-ingested, leading to a lot of extra work. I have not verified that, but don't see why it wouldn't have happened. So this probably also speeds up checking out adjusted branches. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-01-25 17:25:42 -04:00
Joey Hess	5c7e6629cf	Fix a bug in view filename generation when a metadata value ended with "/" Or ":" or "\" on Windows, eg "c:" again.	2021-01-22 14:05:14 -04:00
Joey Hess	95cd49abdb	fix a bug that prevented git-annex init from working in a submodule This is probably a reversion, but not sure what caused it. By the time Annex.Init runs fixupUnusualReposAfterInit, another git-annex process has at least sometimes already done the necessary fixups. (Eg, one run indirectly by a git command.) But since the Repo is cached, it doesn't realize and does them again. So, avoid crashing when git config --unset fails. This commit was sponsored by Jack Hill on Patreon.	2021-01-21 15:33:15 -04:00
Joey Hess	73df633a62	omit inode from ContentIdentifier for directory special remote Directory special remotes with importtree=yes now avoid unncessary overhead when inodes of files have changed, as happens whenever a FAT filesystem gets remounted. A few unusual edge cases of modifications won't be detected and imported. I think they're unusual enough not to be a concern. It would be possible to add a config setting that controls whether to compare inodes too, but does not seem worth bothering the user about currently. I chose to continue to use the InodeCache serialization, just with the inode zeroed. This way, if I later change my mind or make it configurable, can parse it back to an InodeCache and operate on it. The overhead of storing a 0 in the content identifier log seems worth it. There is a one-time cost to this change; all directory special remotes with importtree=yes will re-hash all files once, and will update the content identifier logs with zeroed inodes. This commit was sponsored by Brett Eisenberg on Patreon.	2021-01-19 13:15:07 -04:00
Joey Hess	2aa4fab62a	avoid crashing when there are remotes using unparseable urls Including the non-standard URI form that git-remote-gcrypt uses for rsync. Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port number. But git could have a remote with that, it would try to run git-remote-ook to handle it. So, git-annex has to allow for such things, rather than crashing. This commit was sponsored by Luke Shumaker on Patreon.	2021-01-18 14:59:08 -04:00
Joey Hess	5193aae385	Bug fix: Fix tilde expansion in ssh urls when the tilde is the last character in the url. Thanks, Grond for the patch.	2021-01-18 12:22:48 -04:00
Joey Hess	6a30d04ece	Bug fix: export with -J could fail when two files had the same content. Exporting is done inside a call to writeLockDbWhile which guarantees there is only one process uploading to a given ExportLocation.	2021-01-13 14:50:48 -04:00
Joey Hess	5e39b7eb8d	Windows: Work around win32 length limits when dealing with lock files	2021-01-13 14:38:35 -04:00
Joey Hess	1e65d1b9af	merged fix from kyle	2021-01-07 13:47:36 -04:00
Joey Hess	c8b1fa67b4	Behavior change: --trust-glacier option no longer overrides trust Since that can lead to data loss, which should never be enabled by an option other than --force. This commit was sponsored by Jake Vosloo on Patreon.	2021-01-07 10:37:43 -04:00
Joey Hess	2bf34fc17f	Behavior change: --trust option no longer overrides trust Since that can lead to data loss, which should never be enabled by an option other than --force. I suppose that using --trust was in some situation, safer than --force, because it doesn't entirely disable checking for data loss, but only disables checking involving data that is on the specified repository. But it seems better to be able to say that data loss only happens with --force. This commit was sponsored by Graham Spencer on Patreon.	2021-01-07 10:34:57 -04:00
Joey Hess	6a0030a110	Behavior change: git-annex trust now needs --force Since unconsidered use of trusted repositories can lead to data loss. Trusted has always been this way, but it used to be acceptable for git-annex to be set up so that data could be lost without using --force, and most or all other ways that can happen have already been eliminated. This commit was sponsored by Mark Reidenbach on Patreon.	2021-01-07 10:09:39 -04:00
Joey Hess	715c6013d4	wording	2021-01-06 14:26:39 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	428d228ee5	docs for requirednumcopies Not implemented yet.	2021-01-05 14:22:44 -04:00
Joey Hess	a3a19518d8	fix --time-limit It got broken in several ways by the streaming seeking optimisations around version 8.20201007. Moved time limit checking out of the matcher, which was a hack in the first place. So everywhere that uses Limit.getMatcher needs to check time limit. Well, almost everywhere. Command.Info uses it, but it does not make sense to time limit getting info. And Command.MultiCast uses it just to build up a list of files that then get passed to a command, so it would never have hit the timeout in a useful way. This implementation is a little more expensive when at time limit than necessary, since it continues seeking only to discard everything after the time limit. I did try making it close the file handles to force a faster shutdown, but that didn't work and hung. Could certianly be improved somehow, but seeking is probably not the expensive bit when a time limit is hit, so this seems acceptable for now.	2021-01-04 15:57:11 -04:00
Joey Hess	5ce61c6b2a	add: Significantly speed up adding lots of non-large files to git * add: Significantly speed up adding lots of non-large files to git, by disabling the annex smudge filter when running git add. * add --force-small: Run git add rather than updating the index itself, so any other smudge filters than the annex one that may be enabled will be used.	2021-01-04 13:12:28 -04:00
Joey Hess	7d843e909d	releasing package git-annex version 8.20201129	2020-12-29 13:51:40 -04:00
Joey Hess	7916fc98a3	graft in imported tree to avoid gc Fix a bug that could prevent getting files from an importtree=yes remote, because the imported tree was allowed to be garbage collected.	2020-12-23 14:27:38 -04:00
Joey Hess	cd4c68924b	merged borg Still a couple related todos, but it's basically usable now.	2020-12-22 16:22:44 -04:00
Joey Hess	6b13574827	Windows: include= and exclude= containing '/' will also match filenames that are written using '\' And vice-versa, but it's better to use '/' for portability. Notably, standardPreferredContent contains "archive/*" and that might not match if the filename ends up coming in with the slashes the other way around.	2020-12-15 12:39:34 -04:00
Joey Hess	3519f1ab7f	reorg	2020-12-15 12:12:03 -04:00
Joey Hess	017ce1b811	clarify	2020-12-15 12:09:27 -04:00
Joey Hess	6c890d62f6	initremote: Prevent enabling encryption with exporttree=yes/importtree=yes I do think this was a reversion, but I have not tracked back to what version. While involving the remote config, it's not the same class of problems that I kept having to chase down for a while after the remote config parser reworking.	2020-12-15 12:08:08 -04:00
Joey Hess	ed68a2166d	importfeed: Avoid using youtube-dl when a feed does not contain an enclosure, but only a link to an url which youtube-dl does not support This is common in some feeds, which might mix some items with enclosures, with others that link to posts or whatever. Before this, it would try to use youtube-dl and fail, or if youtube-dl was not allowed, it would incorrectly complain that an url was supported by youtube-dl.	2020-12-15 01:13:21 -04:00
Joey Hess	16315b7812	typo	2020-12-14 21:35:31 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	75acf5f440	improve some edge cases around partial initialization * Guard against running in a repo where annex.uuid is set but annex.version is set, or vice-versa. * Avoid autoinit when a repo does not have annex.version or annex.uuid set, but has a git-annex objects directory, suggesting it was used by git-annex before.	2020-12-14 13:17:43 -04:00
Joey Hess	6a11b6fab8	Support special remotes that are configured with importtree=yes but without exporttree=yes There was no particular reason not to support this, other than maybe a lack of a use case. One use case would of course be a remote that you want to avoid overwriting content on. A new use case is the idea of importing from backups, eg borg, where exporting is not necessarily supported at all. This commit was sponsored by Brock Spratlen on Patreon.	2020-12-10 13:17:40 -04:00
Joey Hess	41f2c308ff	stall detection is working New config annex.stalldetection, remote.name.annex-stalldetection, which can be used to deal with remotes that stall during transfers, or are sometimes too slow to want to use. This commit was sponsored by Luke Shumaker on Patreon.	2020-12-08 15:22:18 -04:00
Joey Hess	0540e987b3	improve p2p protocol handling of requested object not available Avoid spurious "verification of content failed" message when downloading content from a ssh or tor remote fails due to the remote no longer having a copy of the content. The P2P protocol already handled this case by sending DATA 0, followed by VALID. But VALID was not really right, because the data is not the requested data. So, send DATA 0, followed by INVALID. Old versions of git-annex handle INVALID the same as VALID in this case. Now new versions avoid displaying an incorrect message. It would be better for the P2P protocol to have a different way to indicate this, like perhaps sending INVALID without DATA. But that would be a breaking change and need a new protocol verison. Since INVALID already is part of the protocol and already needs to be handled, using it for this special case too seems ok, and avoids the complication of another protocol version. This commit was sponsored by Jochen Bartl on Patreon.	2020-12-01 16:05:55 -04:00
Joey Hess	92136284b1	avoid hGetMetered 0 closing the handle This is an edge case, which happened to be triggered by the P2P protocol seeing DATA 0. When reading 0 bytes, getting an empty string does not mean the handle has reached EOF. I verified there was in fact a bug, where get of an empty file followed by another file would get the empty file and then fail with "handle is closed". This fixes it. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-01 15:39:22 -04:00
Joey Hess	7776677a5f	Fix hang on shutdown of external special remote using ASYNC protocol extension. Reversion introduced in version 8.20201007, one release after the 1st release with the extension. Surprisingly, hClose can hang if another thread is reading from the handle. This is because it uses takeMVar. The use of cancel here does mean that, if receiveMessageAddonProcess or Remote.External.AsyncExtension.receiveloop allocated some resource in a non-async-exception safe way, they might not get a chance to clean it up. They do not appear to, and anyway, this only happens when git-annex is shutting down, so any recource that did leak would not be a problem. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-30 13:04:02 -04:00
Joey Hess	dad8442572	releasing package git-annex version 8.20201127	2020-11-27 12:57:02 -04:00
Joey Hess	ff4354c6e4	Made the test suite significantly less noisy Only displaying git-annex and git command output when something went wrong. A few could still leak stderr. These include the couple of calls to readProcess, which reads stdin but lets stderr through. But they don't leak any usually, so probably only would when failing anyway. Currently, there is no excess output at all! This commit was sponsored by Brock Spratlen on Patreon.	2020-11-24 14:15:40 -04:00
Joey Hess	88cef18fac	upgrade: Support an edge case upgrading a v5 direct mode repo where nothing had ever been committed to the head branch This commit was sponsored by Jack Hill on Patreon.	2020-11-24 12:31:17 -04:00
Joey Hess	04dca96710	changelog	2020-11-19 14:46:54 -04:00
Joey Hess	b90b9b936d	don't rely on exception for http 416 Fix a bug that could make resuming a download from the web fail when the entire content of the file is actually already present locally. What a mess that Request can throw exceptions or not, depending on how it's configured. Makes it very hard if you need to handle some specific http status codes in a function like this! Implementing everything two ways did not seem appealing, if possible at all, so I decided to override the Request if it did come configured to throw exception on non-2xx http status. Other exceptions, like from http-client-restricted, or due to a redirect to a non-http url, still get thrown. This commit was sponsored by Luke Shumaker on Patreon.	2020-11-19 14:44:42 -04:00
Joey Hess	b3c88da181	fix windows assistant upgrade glitch Prevent windows assistant from trying (and failing) to upgrade itself, which has never been supported on windows. The new windows build is made with UPGRADE_LOCATION set, which enabled this code path that had never run on windows before, and doesn't work. I don't want to try to support self-upgrade on windows, or generally on other OS's than the ones where its working, so added a check for that. This way the build can keep setting UPGRADE_LOCATION and if some later git-annex does learn how to upgrade itself on some OS, it won't need changing the build setup.	2020-11-19 12:50:25 -04:00
Joey Hess	4b739fc460	Fix build on Windows Thanks to bug reporter for the patch.	2020-11-19 12:33:00 -04:00
Joey Hess	043eee0cb5	update	2020-11-18 15:16:49 -04:00
Joey Hess	6b63278f31	init: When writing hook scripts, set all execute bits, not only the user execute bit	2020-11-17 13:31:12 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	26cf26caca	Merge branch 'master' into symlink-missing	2020-11-16 10:03:12 -04:00
Joey Hess	5a8d01f63e	examinekey: Added a "file" format variable For consistency with find, and for easier scripting.	2020-11-16 09:59:11 -04:00
Joey Hess	864af53a2d	releasing package git-annex version 8.20201116	2020-11-16 09:38:29 -04:00
Joey Hess	e66b7d2e1b	rename to --unlock-present and better reverse adjusting An --unlock-present branch reverses back to a branch where all files that get modified or renamed become locked, even if they were originally unlocked. This is the same that reversing a --unlock branch works, and the new name makes that commonality more clear.	2020-11-13 14:56:43 -04:00
Joey Hess	3899e216af	Merge branch 'master' into symlink-missing	2020-11-13 14:19:45 -04:00
Joey Hess	a30030c4a6	move: Fix a regression in the last release that made move --to not honor numcopies settings This commit was sponsored by Svenne Krap on Patreon.	2020-11-13 14:19:32 -04:00
Joey Hess	c8e49c5ef5	git-annex adjust --lock-missing Like --hide-missing the branch does not get updated when content availability changes. Seems to basically work, but sync does not update it yet. Also, when a file is present and so unlocked, git mv followed by git-annex sync results in the basis branch being updated to contain the file with the new name, unlocked. This seems different than what happens in an adjusted unlocked branch, where the commit propigates back locked. Probably the reverse adjustment code needs to be improved to handle this case.	2020-11-13 13:39:44 -04:00
Joey Hess	7566aa6bc5	examinekey: Added --migrate-to-backend Note that, the way the SeekInput parser is written to support batch mode, it's actually possible to do git-annex examinekey "SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E While that might be kind of useful to support multiple migrations not using batch mode, I have not documented it. It would be better to take pairs of key and file in that case.	2020-11-12 14:09:14 -04:00
Joey Hess	12e32d1dee	examinekey: Added two new format variables: objectpath and objectpointer	2020-11-12 13:02:31 -04:00
Joey Hess	92b7b1964d	add warning on add of annex link Warn when adding a annex symlink or pointer file that uses a key that is not known to the repository, to prevent confusion if the user has copied it from some other repository. This commit was sponsored by Jake Vosloo on Patreon.	2020-11-10 12:10:51 -04:00
Joey Hess	d032b0885d	use MatchingKey when a Key is known This fixes a bug where a file that was not preferred content could be transferred to a remote. This happened when the file got deleted after the sync started running. The only time checkMatcher is run without a Key is in calls to checkFileMatcher, which are only done by add, addurl, import, and smudge --clean. Those won't be affected by this kind of race. Anything else that might be precaching and have a similar race as sync will also be fixed, but I don't know if it actually affected anything other than sync. As well as fixing a bug, this also probably makes sync and --auto faster by avoiding the redundant key lookup. This commit was sponsored by Graham Spencer on Patreon.	2020-11-09 15:17:22 -04:00
Joey Hess	2dabd4cc2d	releasing package git-annex version 8.20201103	2020-11-03 11:53:11 -04:00
Joey Hess	9252f86b2e	view: Fix a reversion in 8.20200522 that broke entering or changing views. Commit `2dc7b5186a` messed up indentation. This commit was sponsored by Noam Kremen on Patreon.	2020-11-02 14:47:08 -04:00
Joey Hess	7245a9ed53	Improve shutdown process for external special remotes and external backends Make sure to relay any remaining stderr from the process after it has shut down, rather than closing stderr just before shutdown. This avoids a situation where the process is still running and tries to write to stderr, getting a SIGPIPE. And, it ensures that no stderr output is lost. This may fix a problem encountered by datalad on windows, where it hangs during the external special remote shutdown. Before commit `a49d300545`, it closed stdin and stdout, but left stderr open, and never killed the stderr waiter thread, which presumably exited on its own. For async exception safety, do need to at make sure that thread gets waited on, as that commit does, but it introduced this problem. Note that, the process's stdout is closed before waiting on it. It's too late for anything it writes to stdout to be processed, and since we're not going to consume any such writes, this avoids the process getting blocked writing to stdout due to us not reading what it's buffered. This does mean that if the process writes to stdout too late, it will get a SIGPIPE. (This was already the case before the above-mentioned commit.) In practice, I think only the protocol's ERROR is allowed to be sent at a point where this could happen.	2020-11-02 12:56:35 -04:00
Joey Hess	64e7bac810	view: Avoid using ':' from metadata when generating a view Because it's a special character on Windows ("c:"). Use same technique already used for '/' and '\'. I didn't record how I generated their encoded forms before, so am sure there was a better way, but the way I did it now is to look at ghci> encodeFilePath "∕" "\226\136\149" And then the difference from that to "\56546\56456\56469" is adding 56320 to each, to get up to the escaped code plane. See comment for why I think handling ':' is ok, but that other illegal windows filenames won't. Note that, this should be enough to make the test suite always work. Other windows illegal filenames will fail at checkout time when it tries to put the illegal filename on the filesystem.	2020-10-26 15:38:08 -04:00
Joey Hess	5e458e8ac6	changelog	2020-10-26 13:43:40 -04:00
Joey Hess	f3070d2d7d	Windows build changed to one done by the datalad-extensions project using Github actions This is a cleaner build than on Jenkins because the whole environment setup is handled by the CI config, at least up to the point of "get a random bag of Windows bytes". Also, the Jenkins autobuilder has been intermittently failing for a long time, not due to any problem with git-annex but just a failure to clean up directories. Also, this build runs the test suite, and it is (mostly) passing. Test suite always failed in the jenkins environment. Also, this build includes libmagic. Here is the build workflow used by github actions: https://github.com/datalad/datalad-extensions/blob/master/.github/workflows/build-git-annex-windows.yaml The libmagic build has its own workflow: https://github.com/datalad/file-windows/blob/master/.github/workflows/build.yml (Also cleaned up some windows build cruft I don't use anymore.) There is no build-version file to link to. I've opened a todo requesting one: https://github.com/datalad/datalad-extensions/issues/55	2020-10-26 13:17:23 -04:00
Joey Hess	8fda1ef0fa	document version for --force-large/--force-small also fix the wrong name in the changelog	2020-10-26 11:34:30 -04:00
Joey Hess	a108b00b33	testremote: Display exceptions when tests fail, to aid debugging	2020-10-23 15:41:57 -04:00
Joey Hess	681313dfd4	deal with .git pointer file in Git.CurrentRepo This fixes the bug. Note, it's only done when GIT_DIR is set. When it's not set, Git.Construct already handled it. This is why it was only noticed with this git submodule command. This commit was sponsored by Brett Eisenberg on Patreon.	2020-10-23 14:56:12 -04:00
Joey Hess	0736383e98	Fix bug that prevented linux standalone bundle from working on a fresh install Bug was introduced in version 8.20201007, lost a necessary mkdir. This commit was sponsored by Noam Kremen on Patreon.	2020-10-23 12:19:40 -04:00
Joey Hess	dad4be97c2	speculatively use remote's configured chunk size as a fallback When a special remote has chunking enabled, but no chunk sizes are recorded (or the recorded ones are not found), speculatively try chunks using the configured chunk size. This makes eg, git-annex fsck --from remote be able to fix up the location log of a file that the git-annex branch does not indicate is stored on the remote. Note that fsck does not fix up the chunk log to indicate the chunk size. So, changing the chunk config of the remote after that will still prevent accessing the chunks stored on it. Maybe fsck should, but I wanted to start with this and see if it's needed.	2020-10-22 13:11:06 -04:00
Joey Hess	0133b7e5a8	move: Improve resuming a move that was interrupted after the object was transferred In cases where numcopies checks prevented the resumed move from dropping the object from the source repository, it now relies on a log of recent moves to replicate the behavior of the interrupted command. Performance: Probably noticable impact, since it has to add to the log, check the log, and remove from the log. Seems worth it to avoid this annoying edge case. The log functions are pretty well optimised to avoid unncessary work. An performance improvement to make later would be to avoid cleanup doing anything if it's not written to the log file, and has confirmed that the log file does not contain the log line. This commit was sponsored by Jake Vosloo on Patreon.	2020-10-21 10:31:56 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	9a5cd96f0d	Fix a memory leak introduced in the last release The problem was this line: cleanup = and <$> sequence (map snd v) That caused all of v to be held onto until the end, when the cleanup action was run. I could not seem to find a bang pattern that avoided the leak, so I resorted to a IORef, rather clunky, but not a performance problem because it will only be written once per git ls-files, so typically just 1 time. This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-13 16:31:01 -04:00
Joey Hess	d54dd0ef9c	Fix build on Windows with network-3 inet_addr was removed, but all this needs is localhost, so hardcoding it should work fine. It may be that this windows ifdef is no longer needed. It was added in 2013 with a note that getAddrInfo didn't work on windows, but it seems likely such a problem would have been fixed since.	2020-10-08 10:50:39 -04:00
Joey Hess	cf33be21ac	releasing package git-annex version 8.20201007	2020-10-07 14:10:56 -04:00
Joey Hess	20f86e43f7	Fix a build failure on Windows.	2020-10-07 12:04:54 -04:00
Joey Hess	e0ca1236ee	runshell: Update files atomically when preparing to run git-annex This does not make it entirely idempotent, but it's a start.	2020-10-05 13:38:34 -04:00
Joey Hess	cd9a60bc7d	runshell: Fix a edge case where rm errors were sent to stdout, which could confuse things parsing git-annex output.	2020-10-05 12:44:40 -04:00
Joey Hess	5555697ae6	Enable building with git-annex benchmark by default Only turning it off when the criterion library is not installed. Not enabled for osx or i386ancient yet since that will need some invesitgation to update their respective stack.yaml files.	2020-10-02 13:57:10 -04:00
Joey Hess	37426920d8	Fix build with Benchmark build flag Broke a while ago during optimisation work, and not noticed since the flag is disabled by default. This commit was sponsored by Brock Spratlen on Patreon.	2020-10-02 13:30:24 -04:00
Joey Hess	c56efbbdb6	import: Check gitignores when importing trees from special remotes It seemed best to do this, for consistency with every other way files can get into a git-annex repo. Although it's just a bit strange that a local .gitignore file affects the pseudo-commits made for the remote that's imported from. This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-30 10:41:59 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	084b502c7a	httpalso: Support being used with special remotes that do not have encryption= in their config.	2020-09-29 13:56:27 -04:00
Joey Hess	b2cf284d2a	upgrade: Avoid an upgrade failure of a bare repo in unusual circumstances	2020-09-29 13:45:14 -04:00
Joey Hess	1610d94776	addurl: Avoid a redundant git ignores check for speed Ensure that checkCanAdd is used everywhere a file is added to git, so git add is run with -f, presumably avoiding the work it would usually do to check ignores.	2020-09-29 13:00:41 -04:00
Joey Hess	658ea7ca3c	sync --no-content import from directory special remote sync: When run without --content, import without copying from importtree=yes directory special remotes. (Other special remotes may support this later as well.) This commit was sponsored by Svenne Krap on Patreon.	2020-09-28 15:29:08 -04:00
Joey Hess	15c1ee16d9	import --no-content: Check annex.largefiles Import small files into git, the same as is done when importing with content. Which means, for small files, --no-content does download them. If the largefiles expression needs the file content available (due to mimetype or mimeencoding being used), the import will fail. This commit was sponsored by Jake Vosloo on Patreon.	2020-09-28 13:28:57 -04:00
Joey Hess	ace02f41b0	seek: defer matcher check until more info is known Sped up seeking for files to operate on, when using options like --copies or --in, by around 20%. Benchmark showed an increase for --copies from 155 seconds to 121 seconds, and --in remote will be similar to that. For --in here, the speedup was less, 5-10% or so. (both warm cache) This commit was sponsored by Jack Hill on Patreon.	2020-09-24 17:59:12 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	68f9766544	Improve --debug output to show pid of processes that are started and stopped getPid returns Nothing if the process has already been stopped, and in that case, the pid will not be displayed. I think that would only happen if waitForProcess or similar gets called more than once on the same process handle though. getPid on unix has an overhead of only a MVar read. On Windows it needs to make a syscall, so will be probably more expensive. While the added expense happens even when debug logging is disabled, it should be small enough compared with the overhead of starting a process that it's not a problem. (It does occur to me that a debugM that took an IO String could only run it when debugging is really enabled, which would improve performance. It does not seem possible to use the current hslogger interface to do that though; it does not expose the information that would be needed.)	2020-09-24 12:39:57 -04:00
Joey Hess	6a5e0cbfc7	Improve the "Try making some of these repositories available" message With some hints for the user for what to do. Took care to avoid changing the json output. It would have been ok to add the new separated lists to it, in addition to the old list, but I didn't do that because I didn't see much point.	2020-09-22 14:10:30 -04:00
Joey Hess	d0b06c17c0	Added --no-check-gitignore option for finer grained control than using --force. add, addurl, importfeed, import: Added --no-check-gitignore option for finer grained control than using --force. (--force is used for too many different things, and at least one of these also uses it for something else. I would like to reduce --force's footprint until it only forces drops or a few other data losses. For now, --force still disables checking ignores too.) addunused: Don't check .gitignores when adding files. This is a behavior change, but I justify it by analogy with git add of a gitignored file adding it, asking to add all unused files back should add them all back, not skip some. The old behavior was surprising. In Command.Lock and Command.ReKey, CheckGitIgnore False does not change behavior, it only makes explicit what is done. Since these commands are run on annexed files, the file is already checked into git, so git add won't check ignores.	2020-09-18 13:19:13 -04:00
Joey Hess	922621301a	Serialize use of C magic library, which is not thread safe. This fixes failures uploading to S3 when using -J. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-17 17:27:42 -04:00
Joey Hess	83df401d93	Merge branch 'batchasync' into master	2020-09-16 13:02:58 -04:00
Joey Hess	877ef84a1b	support --batch -J --batch combined with -J now runs batch requests concurrently for many commands. Before, the combination was accepted, but did not enable concurrency. Since the output of batch requests can be in any order, --json with the new "input" field is recommended to be used, to determine which batch request each response corresponds to. If --json is not used, batch mode still runs concurrently, using the usual concurrent-output. That will not be very useful for most batch mode users, probably, but who knows. If a program was using --batch -J before, and was parsing non-json output, this could break it. But, it was relying on git-annex not supporting concurrency despite it being enabled, so it should have expected concurrent output. So, I think that's ok. annex.jobs does not enable concurrency in --batch mode, because that would confuse programs that use --batch but don't expect concurrency.	2020-09-16 12:10:37 -04:00
Joey Hess	fcf5d11c63	add "input" field to json output The use case of this field is mostly to support -J combined with --json. When that is implemented, a user will be able to look at the field to determine which of the requests they have sent it corresponds to. The field typically has a single value in its list, but in some cases mutliple values (eg 2 command-line params) are combined together and the list will have more. Note that json parsing was already non-strict, so old git-annex metadata --json --batch can be fed json produced by the new git-annex and will not stumble over the new field.	2020-09-15 16:22:44 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	5844a54869	aws-0.22 improved its support for setting etags, which improves support for versioned S3 buckets. Remove placeholder version number I used when implementing the feature in aws. This commit was sponsored by Ethan Aubin.	2020-09-14 18:37:49 -04:00
Joey Hess	1a785d05c0	releasing package git-annex version 8.20200908	2020-09-08 14:20:47 -04:00
Joey Hess	dcaa1c1cc9	reorder	2020-09-08 12:54:17 -04:00
Joey Hess	6ea511beb4	Removed the S3 and WebDAV build flags So these special remotes are always supported. IIRC these build flags were added because the dep chains were a bit too long, or perhaps because the libraries were not available in Debian stable, or something like that. That was long ago, those reasons no longer apply, and users get confused when builtin special remotes are not available, so it seems best to remove the build flags now. If this does cause a problem it can be reverted of course.. This commit was sponsored by Jochen Bartl on Patreon.	2020-09-08 12:42:59 -04:00
Joey Hess	62372ee052	resolvemerge: Improve cleanup of cruft left in the working tree by a conflicted merge This commit was sponsored by Jake Vosloo on Patreon.	2020-09-07 16:50:27 -04:00
Joey Hess	d120c73302	sync, assistant: When merge.directoryRenames is not set, default it it to "false" Works better with automatic merge conflict resolution than git's ususual default of "conflict". This is not done when automatic merge conflict resolution is disabled. This commit was sponsored by Mark Reidenbach on Patreon.	2020-09-07 13:50:58 -04:00
Joey Hess	69053a93a2	resolvemerge: Improve cleanup of files that were deleted by one side of a conflicted merge, and modified by the other side This case was handled by cleanConflictCruft, but only when the annexed file's object was present. When not present, it left the annexed file with the original name, not checked into git, while adding the variant file. So, add an explicit deletion of the deleted file in this case. My specific case where this happened actually involves merge.directoryRenames=conflict. After a merge involving that, the situation was the file appears as "added by them", because that caused the file that they added to be moved into a directory we renamed. That case is the same as them adding a modified version of the file, while we deleted it. (Except for the history of the file, since it's a new file, but this doesn't look at history.) This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-07 12:25:57 -04:00
Joey Hess	e36bae74da	Exposed annex.forward-retry git config One reason is, 5 is an arbitrary number so ought to be configurable. The real reason though, is I wanted to make the man page explain when forward retry can override annex.retry, and having a config made the man page easier to write.	2020-09-04 15:16:40 -04:00
Joey Hess	2bb933eb60	import: Retry downloads that fail Also, using the transfer machinery for this makes eg, git-annex info show in-progress imports, and makes --notify-start/finish work.	2020-09-04 13:54:05 -04:00
Joey Hess	46eb48d7c0	Retry transfers to exporttree=yes remotes same as for other remotes The comment about noRetry is not well-justified, because transfers to many remotes cannot be resumed, but retries are still allowed for those.	2020-09-04 13:24:08 -04:00
Joey Hess	1d244bafbd	Limit retrying of failed transfers when forward progress is being made to 5 To avoid some unusual edge cases where too much retrying could result in far more data transfer than makes sense.	2020-09-04 12:46:37 -04:00
Joey Hess	6e9a4f50f3	make viaTmp honor umask Fixed several cases where files were created without file mode bits that the umask would usually set. This included exports to the directory special remote, torrent files used by the bittorrent special remote, hooks written by git-annex init, and some log files in .git/annex/ Audited all calls, looking for ones that didn't want the umask bits to be set. All such turned out to already set the specific restrictive file mode they wanted.	2020-09-02 14:54:07 -04:00
Joey Hess	8656afd3e1	rename http special remote to httpalso "http" was too generic and easy to confuse with web. The new name makes clear it's used in addition to some other remote. And other protocols can use the same naming scheme.	2020-09-02 10:41:53 -04:00
Joey Hess	571ec900ac	Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https. With automatic layout learning!	2020-09-01 15:16:35 -04:00
Joey Hess	41ebed3941	Support git remotes where .git is a file, not a directory Eg when --separate-git-dir was used, and core.symlinks=false. This commit was sponsored by Brock Spratlen on Patreon.	2020-08-28 15:08:14 -04:00
Joey Hess	cde3e5eb0c	test: Stop gpg-agent daemons that are started for the test framework's gpg key They normally shutdown when the GNUPGHOME directory is deleted, but on NFS they keep the directory from being deleted. And also, this avoids a number of them piling up while the test suite is running.	2020-08-28 14:28:42 -04:00
Joey Hess	b68f214312	Display a message when git-annex has to wait for a pid lock file held by another process	2020-08-26 13:05:34 -04:00
Joey Hess	7bdb0cdc0d	add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess Fixes reversion in 8.20200617 that made annex.pidlock being enabled result in some commands stalling, particularly those needing to autoinit. Renamed runsGitAnnexChildProcess to make clearer where it should be used. Arguably, it would be better to have a way to make any process git-annex runs have the env var set. But then it would need to take the pid lock when running any and all processes, and that would be a problem when git-annex runs two processes concurrently. So, I'm left doing it ad-hoc in places where git-annex really does run a child process, directly or indirectly via a particular git command.	2020-08-25 14:57:49 -04:00
Joey Hess	6b0532e532	wording	2020-08-25 14:47:17 -04:00
Joey Hess	2ca1ff62dc	addurl --file youtube-dl reversion fix addurl: Fix reversion in 7.20190322 that made --file not be honored when youtube-dl was used to download media. `8758f9c561` was on the right track, but missed that \| otherwise prevented the code it added from being used. Also, refactored out a common function. This commit was sponsored by Graham Spencer on Patreon.	2020-08-25 12:56:45 -04:00
Joey Hess	27329f0bb1	stack.yaml: Updated to lts-16.10 Needs stack version 2.3 to build, which has only recently made it into debian unstable. This commit was sponsored by Jake Vosloo on Patreon.	2020-08-24 14:11:37 -04:00
Joey Hess	f241a3cd3d	Display warning when external special remote does not start up properly, or is not usable I'm sure this used to work, but somewhere along the line something or things (getCost and getAvailability I think, probably others) started catching the exception and not displaying it. So, show warnings.	2020-08-14 15:38:31 -04:00
Joey Hess	05b2b46a82	async extension done	2020-08-14 15:24:34 -04:00
Joey Hess	020e588262	reorder	2020-08-10 16:18:35 -04:00
Joey Hess	bcbdada8bf	fixed	2020-08-10 13:12:55 -04:00
Joey Hess	506ffea5e6	stop symlink check once the top of the working tree is reached Avoid complaining that a file with "is beyond a symbolic link" when the filepath is absolute and the symlink in question is not actually inside the git repository. This assumes that inodes remain stable while the command is running. I think they always will, the filesystems where they are unstable change them across mounts. (If inodes were not stable, it would just complain about symlinks in the path that are not inside the working tree.) (On windows, I don't want to assume anything about inodes, they could be random numbers for all I know. But if they were, this would still be ok, as long as windows doesn't have symlinks that are detected by isSymbolicLink. Which seems a fair bet.)	2020-08-06 20:14:30 -04:00
Joey Hess	283d2f85d1	importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_' sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was running it on parts of the template. So eg the leading '.' in the extension got sanitized. Note the added case for sanitizeLeadingFilePathCharacter ('/':_) -- this was added because, if the template is title/episode and the title is not set, it would expand to "/episode". So this is another potential security fix.	2020-08-05 11:35:00 -04:00
Joey Hess	c4ec52b9ae	Slightly sped up the linux standalone bundle Reduce the number of directories listed in libdirs, which makes the linker check a lot less dead ends looking for directories. Eliminated some directories that didn't really contain shared libraries, or only contained the linker. That left only 2, one in lib and one in usr/lib, so consolidate those two. Doing it this way, rather than just consolidating all libs that might exist into a single directory means that, if there are optimised versions of some libs, eg in lib/subarch/foo.so, and lib/subarch2/foo.so, they don't get moved around in a way that would make the linker pick the wrong one.	2020-07-31 14:42:03 -04:00
Joey Hess	049807dbba	external backends implemented	2020-07-29 17:24:34 -04:00
Joey Hess	00c5f04f20	Deal with unusual IFS settings in the shell scripts for linux standalone and OSX app. Thanks, Yaroslav Halchenko	2020-07-24 14:46:50 -04:00
Joey Hess	79187a6eaf	Revert "Unset IFS in shell scripts in the linux standalone build and OSX app." This reverts commit `24125e8dc4`. yoh has a better patch I see	2020-07-24 14:33:13 -04:00
Joey Hess	24125e8dc4	Unset IFS in shell scripts in the linux standalone build and OSX app.	2020-07-24 14:31:11 -04:00
Joey Hess	c5ea2e9d12	better benchmark for move/copy speedup	2020-07-24 13:34:12 -04:00
Joey Hess	18f1fb5841	drop performance improvements Sped up seeking files to drop by 2x, and also some performance improvements to checking numcopies. Interestingly, the seek speedup is not due to precaching, but I think is due to calling getParsed earlier. Annex.Drop had to be changed to check inAnnex there, since it was removed from Command.Drop. All other users of Command.Drop already checked inAnnex themselves. This commit was sponsored by Ryan Newton on Patreon.	2020-07-24 13:27:46 -04:00
Joey Hess	d732ef1a89	move, copy: Sped up seeking for annexed files to operate on by a factor of nearly 2x.	2020-07-24 12:56:02 -04:00
Joey Hess	00865cdae8	Fix a bug in find --branch in the previous version inAnnex check was lost for that code path. To avoid more such mistakes, made withKeyOptions check it when the AnnexedFileSeeker specifies.	2020-07-24 12:05:28 -04:00
Joey Hess	cb74cefde7	Fix a hang when using git-annex with an old openssh 7.2p2 Which had some weird inheriting of ssh FDs by sshd. Bug was introduced in git-annex version 7.20200202.7.	2020-07-21 16:14:25 -04:00
Joey Hess	ac56a5c2a0	Fix a lock file descriptor leak that could occur when running commands like git-annex add with -J Bug was introduced as part of a different FD leak fix in version 6.20160318.	2020-07-21 15:30:47 -04:00
Joey Hess	798fdad660	fix build with dlist-1.0 That removed the list function. This new implementation appears to actually be more efficient anyway, since it avoids toList.	2020-07-21 12:58:51 -04:00
Joey Hess	1ccb6699a1	guidance on size and mtime fields	2020-07-20 19:56:47 -04:00
Joey Hess	abd56fb019	Fix a bug in find --batch in the previous version.	2020-07-20 19:50:53 -04:00
Joey Hess	af901d1366	releasing package git-annex version 8.20200720	2020-07-20 14:41:12 -04:00
Joey Hess	889603336a	fix reversion in skipping deleted files And add a test case for that. This certianly loses some of the 2x performance improvement in file seeking that seekFilteredKeys led to, because now it has to stat the worktree files again. Without benchmarking, I expect there will still be a sizable improvement, and also the git-annex branch precaching that seekFilteredKeys can do will still be a win of its approach. Also worth noting that lookupKey, when the file DNE, check if it's in an adjusted branch with hidden files, and if so, finds the key for the file anyway. That was intended to make git-annex sync --content be able to process those files, but a side effect was that, when a file was deleted but the deletion not yet staged, git-annex commands used to still list it. That was actually a bug. This commit fixes that bug too. (git-annex sync --content on such a branch does not use seekFilteredKeys so was not affected by the reversion or by this behavior change) This commit was sponsored by Jake Vosloo on Patreon.	2020-07-19 21:25:01 -04:00
Joey Hess	7b2d236556	importfeed: stream metadata for 5% speedup On top of the 10% speedup from streaming url logs.	2020-07-14 14:35:26 -04:00
Joey Hess	535cdc8d48	importfeed: Made checking known urls step around 10% faster. This was a bit disappointing, I was hoping for a 2x speedup. But, I think the metadata lookup is wasting a lot of time and also needs to be made to stream. The changes to catObjectStreamLsTree were benchmarked to not also speed up --all around 3% more. Seems I managed to make it polymorphic after all.	2020-07-14 12:47:51 -04:00
Joey Hess	a6afa62a60	improve wording	2020-07-13 17:57:55 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	b4d0f6dfc2	slower but sequential filtering of large files from pointer files There should still be a speedup seeking over pointer files, just not as large as the one seeking over symlinks.	2020-07-10 15:21:58 -04:00
Joey Hess	de3d7d044d	make catObjectStream support newline and carriage return in filenames Turns out the %(rest) trick was not needed. Instead, just maintain a list of files we've asked for, and each cat-file response is for the next file in the list. This actually benchmarks 25% faster than before! Very surprising, but it must be due to needing to shove less data through the pipe, and parse less.	2020-07-08 13:49:03 -04:00
Joey Hess	d010ab04be	sped up the --all option by 2x to 16x by using git cat-file --buffer This assumes that no location log files will have a newline or carriage return in their name. catObjectStream skips any such files due to cat-file not supporting them. Keys have been prevented from containing newlines since 2011, commit `480495beb4`. If some old repo had a key with a newline in it, --all will just skip processing that key. Other things, like .git/annex/unused files certianly assume no newlines in keys too, and AFAICR, such keys never actually worked. Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys generated before that point could perhaps contain a CR. (URL probably not, http probably doesn't support an URL with a raw CR in it.) So, added a warning in fsck about such keys. Although, fsck --all will naturally skip them, so won't be able to warn about them. Not entirely satisfactory, but I'll bet there are not really any such keys in existence. Thanks to Lukey for finding this optimisation.	2020-07-07 13:54:04 -04:00
Joey Hess	d66fc1a464	Revert "async exception safety for coprocesses" This reverts commit `7013798df5`.	2020-07-06 15:11:28 -04:00
Joey Hess	dfa1c21b8a	comment and update changelog with benchmark results	2020-07-06 13:39:42 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	85cd79ea01	no importKey for android yet adb shell has sha256sum sha1sum and some others, so they could be used. They're provided by toybox, so seem about as likely to keep working as find and stat, which it already depends on. Or to not add a dep, could use stat the same as getExportContentIdentifier to get a mtime, and make a WORM key. But do I really want this to default to WORM? Unsure what's the best path, so punting for now.	2020-07-03 14:02:50 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	f912f8e5fd	refix bug in a better way Always run Git.Config.store, so when the git config gets reloaded, the override gets re-added to it, and changeGitRepo then calls extractGitConfig on it and sees the annex.* settings from the override. Remove any prior occurance of -c v and add it to the end. This way, -c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2 Note that, if git had some multiline config that got built up by multiple -c's, this would not work still. But it never worked because before the bug got fixed in the first place, the -c value was repeated many times, so the multivalue thing would have been wrong. I don't think -c can be used with multiline configs anyway, though git-config does talk about them?	2020-07-02 13:32:33 -04:00
Joey Hess	ec0f8a6e74	Fix reversion that broke passing git configs with -c Reverting commit `c8fec6ab0`	2020-07-02 12:42:13 -04:00
Joey Hess	8a797358b7	changelog wording	2020-06-26 14:27:42 -04:00
Joey Hess	8b22e0bf37	lockContent for tahoe Trivial since git-annex cannot remove, but do an active checkKey verification anyway, in case the data was lost somehow. This commit was sponsored by Ryan Newton on Patreon.	2020-06-26 14:23:21 -04:00
Joey Hess	3175015d1b	lockContent for S3 (with versioning=yes) and git-lfs Made several special remotes support locking content on them while dropping, which allows dropping from another special remote when the content will only remain on a special remote of these types. In both cases, verify the content is present actively, because it's certianly possible for things other than git-annex to have removed it. Worth thinking about what to do if at some later point, git-lfs gains support for dropping content, and a content locking operation. That would probably need a transition; first would need to make lockContent use the locking operation. Then, once enough time had passed that we can assume any git-annex operating on the git-lfs remote had that change, git-annex could finally allow dropping from git-lfs. Or, it could be that git-lfs gains support for dropping content, but not locking it. In that case, it seems this commit would need to be reverted, and then wait long enough for that git-annex to be everywhere, and only then can git-annex safely support dropping from git-lfs. So, the assumption made in this commit could lead to bother later.. But I think it's actually highly unlikely git-lfs does ever support dropping; it's outside their centralized model. Probably. :) Worth keeping in mind as the same assumption is made about other special remotes though. This commit was sponsored by Ethan Aubin.	2020-06-26 13:46:42 -04:00
Joey Hess	4229713e63	importfeed: Added some additional --template variables for date and time This commit was sponsored by Ethan Aubin.	2020-06-24 14:24:50 -04:00
Joey Hess	b651d3ede0	test: Fix some test cases that assumed git's default branch name git is making that configurable, and configuring it globally would break the test suite in a few places. No other part of git-annex assumes any branch name. Renamed a few placeholders to make that clearer. This commit was sponsored by Jake Vosloo on Patreon.	2020-06-23 16:40:51 -04:00
Joey Hess	7757c0e900	Honor annex.largefiles when importing a tree from a special remote. This commit was sponsored by Martin D on Patreon.	2020-06-23 16:07:18 -04:00
Joey Hess	5098236c6b	testremote: Fix over-allocation of resources and bad caching Including starting up a large number of external special remote processes. (Regression introduced in version 8.20200501)	2020-06-22 14:25:49 -04:00
Joey Hess	104b3a9c6a	Build with the http-client-restricted library when available Otherwise use the vendored copy as before. The library is in Debian testing but not stable. Once it reaches stable, the vendored copy can be removed. Did not add it to debian/control because IIRC that's used to build git-annex on stable too, possibly. However, the Debian maintainer will probably want to make the package depend on libghc-http-client-restricted-dev This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:31:31 -04:00
Joey Hess	01eb863a14	Build with the git-lfs library when available Otherwise use the vendored copy as before. The library is in Debian testing but not stable. Once it reaches stable, the vendored copy can be removed. Did not add it to debian/control because IIRC that's used to build git-annex on stable too, possibly. However, the Debian maintainer will probably want to make the package depend on libghc-git-lfs-dev. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:21:25 -04:00

... 4 5 6 7 8 ...

1491 commits