git-annex

Author	SHA1	Message	Date
Joey Hess	55bfa414b3	move transfer already in progress message to warning This makes it be displayed in the error-messages field with --json-error-messages. And with --quiet, it will let it be displayed, which makes sense because it's telling the user why what they requested to do has failed to happen.	2021-10-27 14:46:21 -04:00
Joey Hess	669037862a	avoid redundant freezeContent call This opens the potential for the object file to be in place but git-annex is interrupted before it can freeze it. git-annex fsck already fixes that situation, which can also occur when lockContentForRemoval thaws content. Also improve comment to not be Windows-specific.	2021-10-27 14:18:10 -04:00
Reiko Asakura	0db7297f00	Call freezeContent after move into annex This change better supports Windows ACL management using annex.freezecontent-command and annex.thawcontent-command and matches the behaviour of adding an unlocked file. By calling freezeContent after the file has moved into the annex, the file's delete permission can be denied. If the file's delete permission is denied before moving into the annex, the file cannot be moved or deleted. If the file's delete permission is not denied after moving into the annex, it will likely inherit a grant for the delete permission which allows it to be deleted irrespective of the permissions of the parent directory.	2021-10-27 14:05:57 -04:00
Joey Hess	5a9e6b1fd4	when private journal file exists, still read from git-annex branch Fix bug that caused stale git-annex branch information to read when annex.private or remote.name.annex-private is set. The private journal file should not prevent reading more current information from the git-annex branch, but used to. Note that, overBranchFileContents has to do additional work now, when there's a private journal file, it reads from the branch redundantly and more slowly. Sponsored-by: Jack Hill on Patreon	2021-10-26 13:43:50 -04:00
Joey Hess	0f38ad9a69	close keys db to possibly work around WSL1 issue	2021-10-19 13:07:49 -04:00
Joey Hess	887edeb1ad	avoid warning when built with unix-compat 0.5.3 It re-exports modificationTimeHiRes, and provides a windows version. Might be worth using that windows version eventually, but I have not tested it.	2021-10-18 16:25:28 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	9012fa0187	reinject: Fix crash when reinjecting a file from outside the repository Commit `4bf7940d6b` introduced this problem, but was otherwise doing a good thing. Problem being that fileRef "/foo" used to return ":./foo", which was actually wrong, but as long as there was no foo in the local repository, catKey could operate on it without crashing. After that fix though, fileRef would return eg "../../foo", resulting in fileRef returning ":./../../foo", which will make git cat-file crash since that's not a valid path in the repo. Fix is simply to make fileRef detect paths outside the repo and return Nothing. Then catKey can be skipped. This needed several bugfixes to dirContains as well, in previous commits. In Command.Smudge, this led to needing to check for Nothing. That case should actually never happen, because the fileoutsiderepo check will detect it earlier. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 14:06:34 -04:00
Joey Hess	b9aa2ce8d1	resume properly when copying a file to/from a local git remote is interrupted (take 2) This method avoids breaking test_readonly. Just check if the dest file exists, and avoid CoW probing when it does, so when CoW probing fails, it can resume where the previous non-CoW copy left off. If CoW has been probed already to work, delete the dest file since a CoW copy will presumably work. It seems like it would be almost as good to just skip CoW copying in this case too, but consider that the dest file might have started to be copied from some other remote, not using CoW, but CoW has been probed to work to copy from the current place. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:03:01 -04:00
Joey Hess	7ccf642863	revert change that broke test_readonly commit `63d508e885` broke test_readonly. When a local git remote is readonly, tryCopyCoW run to copy a file from it failed at withOtherTmp. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:02:41 -04:00
Joey Hess	e47b4badb3	separate handles for cat-file and cat-file --batch-check This avoids starting one process when only the other one is needed. Eg in git-annex smudge --clean, this reduces the total number of cat-file processes that are started from 4 to 2. The only performance penalty is that when both are needed, it has to do twice as much work to maintain the two Maps. But both are very small, consisting of 1 or 2 items, so that work is negligible. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 13:16:13 -04:00
Joey Hess	798b33ba3d	simplify annex.bwlimit handling RemoteGitConfig parsing looks for annex.bwlimit when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Svenne Krap on Patreon	2021-09-22 10:52:01 -04:00
Joey Hess	05a097cde8	Merge branch 'master' into bwlimit	2021-09-22 10:48:27 -04:00
Joey Hess	4fef94d764	simplify annex.stalldetection handling RemoteGitConfig parsing looks for annex.stalldetection when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Noam Kremen on Patreon	2021-09-22 10:46:10 -04:00
Joey Hess	63d508e885	resume properly when copying a file to/from a local git remote is interrupted Probably this fixes a reversion, but I don't know what version broke it. This does use withOtherTmp for a temp file that could be quite large. Though albeit a reflink copy that will not actually take up any space as long as the file it was copied from still exists. So if the copy cow succeeds but git-annex is interrupted just before that temp file gets renamed into the usual .git/annex/tmp/ location, there is a risk that the other temp directory ends up cluttered with a larger temp file than later. It will eventually be cleaned up, and the changes of this being a problem are small, so this seems like an acceptable thing to do. Sponsored-by: Shae Erisson on Patreon	2021-09-21 17:43:35 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	ec12537774	defer write permissions checking in import until after copy to repo This should complete the fix started in `6329997ac4`, fixing the actual cause of the test suite failure this time. Sponsored-by: Dartmouth College's Datalad project	2021-09-02 13:45:21 -04:00
Joey Hess	bd5494bb9c	fix windows build	2021-09-02 12:21:25 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	6329997ac4	init: check for filesystem where write bit cannot be removed This fixes a reversion caused by `a99a84f342`, when git-annex init is run as root on a FAT filesystem mounted with hdiutil on OSX. Such a mount point has file mode 777 for everything and it cannot be changed. The existing crippled filesystem test tried to write to a file after removing write bit, but that test does not run as root (since root can write to unwritable files). So added a check of the write permissions of the file, after attempting to remove them. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 10:27:28 -04:00
Joey Hess	e853ef3095	decorate openTempFile errors with the template name This is to track down what file in .git/annex/ is being written to via a temp file when the repository is read-only. Sponsored-by: Dartmouth College's Datalad project	2021-08-30 13:05:02 -04:00
Joey Hess	a99a84f342	add: Detect when xattrs or perhaps ACLs prevent locking down a file's content And fail with an informative message. I don't think ACLs can prevent removing the write bit, but I'm not sure, so kept it mentioning them as a possibility. Should git-annex lock also check if the write bits are able to be removed? Maybe, but the case I know about with xattrs involves cp -a copying NFS xattrs, and it's the copy of the file that is the problem. So when locking a file, I guess it will not be the copy. Sponsored-by: Dartmouth College's Datalad project	2021-08-27 14:33:01 -04:00
Joey Hess	6d4a728455	Added annex.youtube-dl-command config This can be used to run some forks of youtube-dl. Sponsored-by: Brett Eisenberg on Patreon	2021-08-27 09:44:23 -04:00
Joey Hess	4ed36b2634	Fix test suite failure on Windows It would be better if the Arbitrary instance avoided generating impossible filenames like "foo/c:bar", but proably this is the only place that splits the file from the directory and then uses the file without the directory.. At least on the quickcheck properties. Sponsored-by: Svenne Krap on Patreon	2021-08-24 14:03:29 -04:00
Joey Hess	492036622a	fix OSX build	2021-08-18 16:35:26 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	88b63a43fa	distinguish between incremental verification failing and not being done Sponsored-by: Dartmouth College's DANDI project	2021-08-18 14:38:02 -04:00
Joey Hess	325bfda12d	refactor	2021-08-18 13:37:00 -04:00
Joey Hess	449851225a	refactor IncrementalVerifier moved to Utility.Hash, which will let Utility.Url use it later. It's perhaps not really specific to hashing, but making a separate module just for the data type seemed unncessary. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 13:19:02 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	a644f729ce	refactor fileCopier Sponsored-by: Dartmouth College's DANDI project	2021-08-16 15:56:24 -04:00
Joey Hess	d889ae0c01	move comment	2021-08-16 15:25:06 -04:00
Joey Hess	aac0654ff4	handle AlreadyInUseError As happens when using the directory special remote, gitlfs, webdav, and S3. But not external, adb, gcrypt, hook, or rsync. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 15:03:48 -04:00
Joey Hess	c4aba8e032	better handling of finishing up incomplete incremental verify Now it's run in VerifyStage. I thought about keeping the file handle open, and resuming reading where tailVerify left off. But that risks leaking open file handles, until the GC closes them, if the deferred verification does not get resumed. Since that could perhaps happen if there's an exception somewhere, I decided that was too unsafe. Instead, re-open the file, seek, and resume. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 14:52:59 -04:00
Joey Hess	e0b7f391bd	improve tailVerify Wait for the file to get modified, not only opened. This way, if a remote does not support resuming, and opens a new file over top of the existing file, it will wait until that remote starts writing, and open the file it's writing to, not the old file. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 14:47:37 -04:00
Joey Hess	e46a7dff6f	fix windows build	2021-08-13 16:36:33 -04:00
Joey Hess	16dd3dd4ca	catch more exceptions I saw this: .git/annex/tmp/SHA256E-s1234376--5ba8e06e0163b217663907482bbed57684d7188024155ddc81da0710dfd2687d: openBinaryFile: resource busy (file is locked) guess catching IO exceptions did not catch that one.	2021-08-13 16:16:46 -04:00
Joey Hess	ff2dc5eb18	INotify.removeWatch can crash Unsure why, possibly if the file has been replaced by another file.	2021-08-13 15:35:18 -04:00
Joey Hess	7503b8448b	inotify reports paths relative to directory being watched Sponsored-by: Dartmouth College's DANDI project	2021-08-13 14:51:15 -04:00
Joey Hess	e07625df8a	convert tailVerify to not finalize the verification Added failIncremental so it can force failure to verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 13:39:02 -04:00
Joey Hess	9d533b347f	tailVerify: return deferred action when it gets behind Sponsored-by: Dartmouth College's DANDI project	2021-08-13 12:32:01 -04:00
Joey Hess	b6efba8139	add tailVerify Not yet used, but this will let all remotes verify incrementally if it's acceptable to pay the performance price. See comment for details of when it will perform badly. I anticipate using this for all special remotes that use fileRetriever. Except perhaps for a few like GitLFS that could feed the incremental verifier themselves despite using that. Sponsored-by: Dartmouth College's DANDI project	2021-08-12 14:38:02 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00
Joey Hess	6111958440	fix test suite `14683da9eb` caused a test suite failure. When the content of a key is not present, a LinkAnnexFailed is returned, but replaceFile then tried to move the file into place, and since it was not written, that crashed. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-08-02 13:59:23 -04:00
Joey Hess	b3c4579c79	work around strange auto-init bug git-annex get when run as the first git-annex command in a new repo did not populate unlocked files. (Reversion in version 8.20210621) I am not entirely happy with this, because I don't understand how `428c91606b` caused the problem in the first place, and I don't fully understand how skipping calling scanAnnexedFiles during autoinit avoids the problem. Kept the explicit call to scanAnnexedFiles during git-annex init, so that when reconcileStaged is expensive, it can be made to run then, rather than at some later point when the information is needed. Sponsored-by: Brock Spratlen on Patreon	2021-07-30 18:36:03 -04:00
Joey Hess	748addbe05	remove second pass in scanAnnexedFiles The pass was needed to populate files when annex.thin was set, but in commit `73e0cbbb19`, reconcileStaged started to do that. So, this second pass is not needed any longer.	2021-07-30 17:46:11 -04:00
Joey Hess	817ccbbc47	split verifyKeyContent This avoids it calling enteringStage VerifyStage when it's used in places that only fall back to verification rarely, and which might be called while in TransferStage and be going to perform a transfer after the verification.	2021-07-29 13:58:40 -04:00
Joey Hess	897fd5c104	add note	2021-07-29 13:14:03 -04:00
Joey Hess	067a9c70c7	simplify code	2021-07-29 12:28:13 -04:00
Joey Hess	3e0b210039	remove unncessary debugs Keeping the ones in Annex.InodeSentinal	2021-07-29 12:19:37 -04:00
Joey Hess	73e0cbbb19	fix problem populating pointer files This is a result of an audit of every use of getInodeCaches, to find places that misbehave when the annex object is not in the inode cache, despite pointer files for the same key being in the inode cache. Unfortunately, that is the case for objects that were in v7 repos that upgraded to v8. Added a note about this gotcha to getInodeCaches. Database.Keys.reconcileStaged, then annex.thin is set, would fail to populate pointer files in this situation. Changed it to check if the annex object is unmodified the same way inAnnex does, falling back to a checksum if the inode cache is not recorded. Sponsored-by: Dartmouth College's Datalad project	2021-07-27 14:26:49 -04:00
Joey Hess	de482c7eeb	move verifyKeyContent to Annex.Verify The goal is that Database.Keys be able to use it; it can't use Annex.Content.Presence due to an import loop. Several other things also needed to be moved to Annex.Verify as a conseqence.	2021-07-27 14:07:23 -04:00
Joey Hess	14683da9eb	fix potential race in updating inode cache Some uses of linkFromAnnex are inside replaceWorkTreeFile, which was already safe, but others use it directly on the work tree file, which was race-prone. Eg, if the work tree file was first removed, then linkFromAnnex called to populate it, the user could have re-written it in the interim. This came to light during an audit of all calls of addInodeCaches, looking for such races. All the other uses of it seem ok. Sponsored-by: Brett Eisenberg on Patreon	2021-07-27 13:08:08 -04:00
Joey Hess	e4b2a067e0	fix potential race in updating inode cache In Annex.Content, the object file was statted after pointer files were populated. But if annex.thin is set, once the pointer files are populated, the object file can potentially be modified via the hard link. So, it was possible, though seemingly very unlikely, for the inode of the modified object file to be cached. Command.Fix and Command.Fsck had similar problems, statting the work tree files after they were in place. Changed them to stat the temp file that gets moved into place. This does rely on .git/annex being on the same filesystem. If it's not, the cached inode will not be the same as the one that the temp file gets moved to. Result will be that git-annex will later need to do an expensive verification of the content of the worktree files. Note that the cross-filesystem move of the temp file already is a larger amount of extra work, so this seems acceptable. Sponsored-by: Luke Shumaker on Patreon	2021-07-27 12:29:10 -04:00
Joey Hess	3b5a3e168d	check if object is modified before starting to send it Fix bug that caused some transfers to incorrectly fail with "content changed while it was being sent", when the content was not changed. While I don't know how to reproduce the problem that several people reported, it is presumably due to the inode cache somehow being stale. So check isUnmodified', and if it's not modified, include the file's current inode cache in the set to accept, when checking for modification after the transfer. That seems like the right thing to do for another reason: The failure says the file changed while it was being sent, but if the object file was changed before the transfer started, that's wrong. So it needs to check before allowing the transfer at all if the file is modified. (Other calls to sameInodeCache or elemInodeCaches, when operating on inode caches from the database, could also be problimatic if the inode cache is somehow getting stale. This does not address such problems.) Sponsored-by: Dartmouth College's Datalad project	2021-07-26 17:33:49 -04:00
Joey Hess	f195f3b541	more inode cache debugging	2021-07-26 12:57:35 -04:00
Joey Hess	0073384850	add debugging in sameInodeCache	2021-07-26 10:58:07 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	635e7f3e26	split annexLocations To avoid mistakes like commit `0ccbed4f6f`, be explicit about the two variants of this. Incidentially avoids a small amount of overhead in calling reverse. Sponsored-by: Shae Erisson on Patreon	2021-07-16 14:17:56 -04:00
Joey Hess	0ccbed4f6f	fix oops `dd31fe7b9e` broke non-bare repos by using bare hash dirs first, oops	2021-07-15 21:01:07 -04:00
Joey Hess	dd31fe7b9e	fall back to checking lower case hash directories in normal repo Fix a bug that prevented getting content from a repository that started out as a bare repository, or had annex.crippledfilesystem set, and was converted to a non-bare repository. This unfortunately means that inAnnex check gets slowed down by a stat call in normal repos when the content is not present. Oh well, such is the cost of backwards compatability with old mistakes. Sponsored-by: Mark Reidenbach on Patreon	2021-07-15 12:16:31 -04:00
Joey Hess	6a581f8b8b	fix init reversion when core.sharedRepository = group init: Fix misbehavior when core.sharedRepository = group that caused it to enter an adjusted branch. (Reversion in version 8.20210630) Commit `4b1b9d7a83` made init call freezeContent in case there was a hook that could prevent writing in situations where perms don't. But with the above git config, freezeContent does not prevent write at all. So init needs to do what freezeContent does with a non-shared git config. Or init could check for that config, and skip the probing, since it won't actually be preventing write to any files. But that would make init too aware if details of Annex.Perms, and also would break if the git config were changed after init. Sponsored-by: Dartmouth College's Datalad project	2021-07-12 10:15:49 -04:00
Joey Hess	9905ec19a7	add pointer to annex.security.allowed-url-schemes Sponsored-by: Kevin Mueller on Patreon	2021-07-02 10:53:45 -04:00
Joey Hess	3a14648142	dropping unused marks as dead Dropping an object with drop --unused or dropunused will mark it as dead, preventing fsck --all from complaining about it after it's been dropped from all repositories. If another repository still has a copy, it won't be treated as dead until it's also dropped from there. The drop has to use --unused, can't be --key or something else, because this indicates that the user has recently ran git-annex unused. If it checked the unused log on every drop, bad things would happen when the unused log was out of date, eg a file used to be unused but then got re-added. Marking such a file as dead could be confusing. When the user uses --unused/dropunused, they must consider the unused information to be up-to-date. The particular workflow this enables is: git annex add foo git annex unannex foo git annex unused git annex drop --unused / dropunused git annex fsck --all # no warnings The docs for git-annex unannex say to use git-annex unused and dropunused, so the user should be pointed in this direction when they want to undo an accidental add. Sponsored-by: Brock Spratlen on Patreon	2021-06-25 15:22:26 -04:00
Joey Hess	df2001aa88	Improve display of errors when transfers fail Transfers from or to a local git repo could fail without a reason being given, if the content failed to verify, or if the object file's stat changed while it was being copied. Now display messages in these cases. Sponsored-by: Jack Hill on Patreon	2021-06-25 13:17:04 -04:00
Joey Hess	51c696679f	avoid using temp file size when deciding whether to retry failed transfer When stall detection is enabled, and a transfer is in progress, it would display a doubled message: (transfer already in progress, or unable to take transfer lock) (transfer already in progress, or unable to take transfer lock) That happened because the forward retry decider had a start size of 0, and an end size of whatever amount of the object the other process had downloaded. So it incorrectly thought that the transferrer process had made progress, when it had in fact immediately given up with that message. Instead, use the reported value from the progress meter. If a remote does not report progress, this will mean it doesn't forward retry, in a situation where it used to. But most remotes do report progress, and any remote that does not can be fixed to, by using watchFileSize when downloading. Also, some remotes might preallocate the temp file (eg bittorrent), so relying on statting its size at this level to get progress is dubious. The same change was made to Annex/Transfer.hs, although only Annex/TransferrerPool.hs needed to be changed to avoid the duplicate message. (An alternate fix would have been to start the retry decider with the size of the object file before downloading begins, rather than 0.) Sponsored-by: Brett Eisenberg on Patreon	2021-06-25 12:04:23 -04:00
Joey Hess	0fe550af75	fix windows build	2021-06-22 09:46:06 -04:00
Joey Hess	4b1b9d7a83	Added annex.freezecontent-command and annex.thawcontent-command configs Freeze first sets the file perms, and then runs freezecontent-command. Thaw runs thawcontent-command before restoring file permissions. This is in case the freeze command prevents changing file perms, as eg setting a file immutable does. Also, changing file perms tends to mess up previously set ACLs. git-annex init's probe for crippled filesystem uses them, so if file perms don't work, but freezecontent-command manages to prevent write to a file, it won't treat the filesystem as crippled. When the the filesystem has been probed as crippled, the hooks are not used, because there seems to be no point then; git-annex won't be relying on locking annex objects down. Also, this avoids them being run when the file perms have not been changed, in case they somehow rely on git-annex's setting of the file perms in order to work. Sponsored-by: Dartmouth College's Datalad project	2021-06-21 14:40:52 -04:00
Joey Hess	ba62c3467b	remove dead code	2021-06-21 13:54:12 -04:00
Joey Hess	4eb3778aec	remove unused import	2021-06-21 12:32:36 -04:00
Joey Hess	694fe3702c	fix 2 build warnings	2021-06-21 11:27:18 -04:00
Joey Hess	d2be68907c	drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies Eg, before with a .gitattributes like: .2 annex.numcopies=2 .1 annex.numcopies=1 And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2 would succeed, leaving just 1 copy, despite foo.2 needing 2 copies. It dropped foo.1 first and then skipped foo.2 since its content was gone. Now that the keys database includes locked files, this longstanding wart can be fixed. Sponsored-by: Noam Kremen on Patreon	2021-06-15 11:38:44 -04:00
Joey Hess	0ed1369dcd	remove unused import	2021-06-15 11:31:59 -04:00
Joey Hess	af9fdf5dba	verify associated files when checking numcopies Most of this is just refactoring. But, handleDropsFrom did not verify that associated files from the keys db were still accurate, and has now been fixed to. A minor improvement to this would be to avoid calling catKeyFile twice on the same file, when getting the numcopies and mincopies value, in the common case where the same file has the highest value for both. But, it avoids checking every associated file, so it will scale well to lots of dups already. Sponsored-by: Kevin Mueller on Patreon	2021-06-15 11:14:52 -04:00
Joey Hess	0b91afb57d	avoid warning	2021-06-15 11:11:55 -04:00
Joey Hess	77517ab506	avoid nub It's O(N^2) which could matter when there are many dup files using the same key.	2021-06-15 10:48:11 -04:00
Joey Hess	3af4c9a29a	fix exponential blowup when adding lots of identical files This was an old problem when the files were being added unlocked, so the changelog mentions that being fixed. However, recently it's also affected locked files. The fix for locked files is kind of stupidly simple. moveAnnex already handles populating unlocked files, and only does it when the object file was not already present. So remove the redundant populateUnlockedFiles call. (That call was added all the way back in `cfaac52b88`, and has always been unncessary.) Sponsored-by: Dartmouth College's Datalad project	2021-06-15 09:45:55 -04:00
Joey Hess	e147ae07f4	remove supportUnlocked check that is not worth its overhead moveAnnex only gets to that check if the object file was not present before. So in the case where dup files are being added repeatedly, it will only run the first time, and so there's no significant speedup from doing it; all it avoids is a single sqlite lookup. Since MVar accesses do have overhead, it's better to optimise for the common case, where unlocked files are supported. removeAnnex is less clear cut, but I think mostly is skipped running on keys when the object has already been dropped, so similar reasoning applies.	2021-06-15 09:28:56 -04:00
Joey Hess	dcd2c95249	fix windows build	2021-06-14 12:43:26 -04:00
Joey Hess	014dc63a55	avoid sometimes expensive operations when annex.supportunlocked = false This will mostly just avoid a DB lookup, so things get marginally faster. But in cases where there are many files using the same key, it can be a more significant speedup. Added overhead is one MVar lookup per call, which should be small enough, since this happens after transferring or ingesting a file, which is always a lot more work than that. It would be nice, though, to move getGitConfig to AnnexRead, which there is an open todo about.	2021-06-14 12:40:41 -04:00
Joey Hess	c4f1465a81	check symlink before reading file This is faster because when multiple files are in a directory, it gets cached.	2021-06-14 11:53:51 -04:00
Joey Hess	26a9ea12d1	handle edge case of symlink to something that is not really a pointer file That seems very unlikely to happen, but still, it's possible it could. And with the recent addition of locked files to the keys db, this could be called by places that did not call it before, so it seems even more important it's correct. Adds an extra stat of the file, and is potentially racy, but both problems are fixed by the unix-2.8.0 path. I have not tested that path builds because that package is not yet released and it would be difficult to install it since it's tightly tied to a ghc version.	2021-06-14 11:35:52 -04:00
Joey Hess	673b2feaf3	rename for clarity Associated files are recorded now also for locked files, but this is only needed to populate unlocked files.	2021-06-14 10:55:24 -04:00
Joey Hess	7b6deb1109	display scanning message whenever reconcileStaged has enough files to chew on Clear visible progress bar first. Removed showSideActionAfter because it can't be used in reconcileStaged (import loop). Instead, it counts the number of files it processes and displays it after it's seen a sufficient to know it's taking a while. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 12:48:30 -04:00
Joey Hess	13b9a288d3	scanAnnexedFiles in smudge --update This makes git checkout and git merge hooks do the work to catch up with changes that they made to the tree. Rather than doing it at some later point when the user is not thinking about that past operation. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:37:47 -04:00
Joey Hess	7f742589f9	claw back annexed file scan speedup Following commit `c941ab6f5b`, this avoids the second, redundant scan when annex.thin is not set. The benchmark now runs in 35.5 seconds, down from 40 seconds. Note that the inode cache of the annex object has to be passed to addInodeCaches now, because it might not already be in the inode caches, unlike previously. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:09:15 -04:00
Joey Hess	c941ab6f5b	avoid double work in git-annex init, second try reconcileStaged populates the db, so scanAnnexedFiles does not need to do it again. It still makes a pass over the HEAD tree, but populating the db was most of the expensive part. Benchmarking with 100,000 files, git-annex init now takes 40 seconds, vs 37 seconds with the old, buggy version of this fix. It should be possible to win those 3 precious seconds per 100k files back, in the case when when annex.thin is not set, with improvements to reconcileStaged that avoid needing this second pass. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 09:36:53 -04:00
Joey Hess	2cb7b7b336	Revert "avoid double work in git-annex init" This reverts commit `0f10f208a7`. The implementation of this turns out to be unsafe; it can lead to a keys db deadlock. scanAnnexedFiles injects a call to inAnnex into reconcileStaged, but inAnnex sometimes needs to read from the keys db, which will try to re-open it when it's in the process of being opened. The exclusive lock of gitAnnexKeysDbLock will then deadlock. This needs to be done in some other way...	2021-06-08 09:11:24 -04:00
Joey Hess	0f10f208a7	avoid double work in git-annex init reconcileStaged was doing a redundant scan to scannAnnexedFiles. It would probably make sense to move the body of scannAnnexedFiles into reconcileStaged, the separation does not really serve any purpose. Sponsored-by: Dartmouth College's Datalad project	2021-06-07 16:50:14 -04:00
Joey Hess	0434674c85	avoid displaying the scanning annexed files message when repo is not large Avoids users thinking this scan is a big deal, when it's not in the majority of repos. showSideActionAfter has some ugly caveats, since it has to display in the background of another action. I could not see a better way to do it and it works fine in this particular case. It also doesn't really belong in Annex.Concurrent, but cannot go in Messages due to an import loop. Sponsored-by: Dartmouth College's Datalad project	2021-06-04 13:16:48 -04:00
Joey Hess	0f54e5e0ae	speed up initial scanning for annexed files Streaming through git this way speeds it up by around 25%. This is similar to the optimisations of seeking annexed files. Sponsored-by: Dartmouth College's Datalad project	2021-05-31 14:29:34 -04:00
Joey Hess	aa00e171cb	annex.supportunlocked should not prevent scan for annexed files That scan used to be only for unlocked files, but no longer..	2021-05-31 10:51:39 -04:00
Joey Hess	189fb05ffb	Added annex.adviceNoSshCaching config. Sponsored-by: Brock Spratlen on Patreon	2021-05-27 12:37:49 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	f46e4c9b7c	fix case where keys db was not initialized in time When the keys db is opened for read, and did not exist yet, it used to skip creating it, and return mempty values. But that prevents reconcileStaged from populating associated files information in time for the read. This fixes the one remaining case I know of where the fix in `a56b151f90` didn't work. Note that, when there is a permissions error, it still avoids creating the db and returns mempty for all queries. This does mean that reconcileStaged does not run and so it may want to drop files that it should not. However, presumably a permissions error on the keys database also means that the user does not have permission to delete annex objects, so they won't be able to drop the files anyway. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:46:59 -04:00
Joey Hess	a56b151f90	fix longstanding indeterminite preferred content for duplicated file problem * drop: When two files have the same content, and a preferred content expression matches one but not the other, do not drop the file. * sync --content, assistant: Fix an edge case where a file that is not preferred content did not get dropped. The sync --content edge case is that handleDropsFrom loaded associated files and used them without verifying that the information from the database was not stale. It seemed best to avoid changing --want-drop's behavior, this way when debugging a preferred content expression with it, the files matched will still reflect the expression. So added a note to the --want-drop documentation, to make clear it may not behave identically to git-annex drop --auto. While it would be possible to introspect the preferred content expression to see if it matches on filenames, and only look up the associated files when it does, it's generally fairly rare for 2 files to have the same content, and the database lookup is already avoided when there's only 1 file, so I did not implement that further optimisation. Note that there are still some situations where the associated files database does not get locked files recorded in it, which will prevent this fix from working. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:07:05 -04:00
Joey Hess	428c91606b	include locked files in the keys database associated files Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.	2021-05-21 16:24:37 -04:00

1 2 3 4 5 ...

1813 commits