git-annex

Author	SHA1	Message	Date
Joey Hess	766720d093	soften language in changelog This bug mostly would happen when the downloads ran very fast or were all failing (how I reproduced it), because there have to be two downloads that finish very close to the same time to trigger the race. So most users of -J probably would not see much impact from the bug.	2021-11-19 12:52:22 -04:00
Joey Hess	623a775609	fix cat-file leak in get with -J Bugfix: When -J was enabled, getting files leaked a ever-growing number of git cat-file processes. (Since commit `dd39e9e255`) The leak happened when mergeState called stopNonConcurrentSafeCoProcesses. While stopNonConcurrentSafeCoProcesses usually manages to stop everything, there was a race condition where cat-file processes were leaked. Because catFileStop modifies Annex.catfilehandles in a non-concurrency safe way, and could clobber modifications made in between. Which should have been ok, since originally catFileStop was only used at shutdown. Note the comment on catFileStop saying it should only be used when nothing else is using the handles. It would be possible to make catFileStop race-safe, but it should just not be used in a situation where a race is possible. So I didn't bother. Instead, the fix is just not to stop any processes in mergeState. Because in order for mergeState to be called, dupState must have been run, and it enables concurrency mode, stops any non-concurrent processes, and so all processes that are running are concurrency safea. So there is no need to stop them when merging state. Indeed, stopping them would be extra work, even if there was not this bug. Sponsored-by: Dartmouth College's Datalad project	2021-11-19 12:51:08 -04:00
Joey Hess	31be0770a5	importfeed: Display url before starting youtube-dl download It was displaying a blank line before.	2021-11-17 13:23:55 -04:00
Joey Hess	c3af94eff4	releasing package git-annex version 8.20211117	2021-11-17 12:20:29 -04:00
Joey Hess	2bd778a46e	importfeed: Fix a crash when used in a non-unicode locale See comment for analysis. At first I thought I'd need to convert all T.unpack in git-annex, but luckily not -- so long as the Text is read from a file, the filesystem encoding is applied and T.unpack is fine. It's only when using Feed that the filesystem encoding is not applied. While this fixes the crash, it does result in some mojibake, eg: itemid=http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-��-questions/ Have not tracked that down, but it must be unrelated, because I've verified that it roundtrips when using encodeUf8: joey@darkstar:~/src/git-annex>LANG=C ghci Utility/FileSystemEncoding.hs ghci> useFileSystemEncoding ghci> Just f <- Text.Feed.Import.parseFeedFromFile "/home/joey/tmp/career_tools_podcasts.xml" ghci> Just (_, x) = Text.Feed.Query.getItemId (Text.Feed.Query.feedItems f !! 0) ghci> decodeBS (Data.Text.Encoding.encodeUtf8 x) "http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-\56546\56448\56467-questions/" ghci> writeFile "foo" $ decodeBS (Data.Text.Encoding.encodeUtf8 x) Writes a file containing the ENDASH character. Sponsored-by: Jochen Bartl on Patreon	2021-11-15 15:04:21 -04:00
Joey Hess	aa6e54ac6e	Fix a typo in the name of youtube-dl (reversion introduced in version 8.20210903)	2021-11-13 08:58:36 -04:00
Joey Hess	51b73ea1fc	migrate: New --remove-size option While intended for converting URL keys added by addurl --fast to be as if added by addurl --relaxed, it can also be used to remove size from other types of keys. Although that is not likely to be useful for checksummed keys, I suppose it could be used for WORM or other non-checksum keys. Specifying the --remove-size option does not prevent other migrations from taking effect if there's a key upgrade to perform, or if the backend has changed. So --backend=URL needs to be used to prevent migrating an URL key to the default backend. Note that it's not possible to use git-annex migrate to convert from a non-URL key to an URL key, as URL keys cannot be generated, except by addurl. So while this can get the same effect as --relaxed would have when addurl --fast was used, when --fast was not used, it won't work, or if --backend=URL is not used will remove the size but not prevent checksum verification, which is not useful. Due to this complexity, I decided not to mention it in the git-annex addurl man page. Sponsored-by: Jochen Bartl on Patreon	2021-11-12 13:28:28 -04:00
Joey Hess	f3326b8b5a	git-lfs gitlab interoperability fix git-lfs: Fix interoperability with gitlab's implementation of the git-lfs protocol, which requests Content-Encoding chunked. Sponsored-by: Dartmouth College's Datalad project	2021-11-10 13:51:11 -04:00
Joey Hess	9d3ce224e3	uninit edge cases * uninit: Avoid error message when no commits have been made to the repository yet. * uninit: Avoid error message when there is no git-annex branch. Sponsored-by: Svenne Krap on Patreon	2021-11-08 16:47:00 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	438e5b56aa	tighter --json parsing for metadata metadata --batch --json: Reject input whose "fields" does not consist of arrays of strings. Such invalid input used to be silently ignored. Used to be that parseJSON for a JSONActionItem ran parseJSON separately for the itemAdded, and if that failed, did not propagate the error. That allowed different items with differently named fields to be parsed. But it was actually only used to parse "fields" for metadata, so that flexability is not needed. The fix is just to parse "fields" as-is. AddJSONActionItemFields is needed only because of the wonky way Command.MetaData adds onto the started json object. Note that this line got a dummy type signature added, just because the type checker needs it to be some type. itemFields = Nothing :: Maybe Bool Since it's Nothing, it doesn't really matter what type it is, and the value gets turned into json and is then thrown away. Sponsored-by: Kevin Mueller on Patreon	2021-11-01 14:42:37 -04:00
Joey Hess	80f1354685	metadata --batch: Avoid crashing when a non-annexed file is input Turns out that CommandStart actions do not have their exceptions caught, which is why the giveup was causing a crash. Mostly these actions do not do very much work on their own, but it does seem possible there are other commands whose CommandStart also throws an exception. So, my first attempt at a fix was to catch those exceptions. But, --json-error-messages then causes a difficulty, because in order to output a json error message, an action needs to have been started; that sets up the json object that the error message will be included in a field of. While it would be possible to output an object with just an error field, this would be json output of a format that the user has no reason to expect, that happens only in an exceptional circumstance. That is something I have always wanted to avoid with the json output; while git-annex man pages don't document what the json looks like, the output has always been made to be self-describing. Eg, it includes "error-messages":[] even when there's no errors. With that ruled out, it doesn't seem a good idea to catch CommandStart exceptions and display the error to stderr when --json-error-messages is set. And so I don't know if it makes sense to catch exceptions from that at all. Maybe I'd have a different opinion if --json-error-messages did not exist though. So instead, output a blank line like other batch commands do. This also leaves open the possibility of implementing support for matching object with metadata --json, which would also want to output a blank line when the input didn't match. Sponsored-by: Dartmouth College's DANDI project	2021-11-01 13:40:43 -04:00
Joey Hess	c260833a6b	releasing package git-annex version 8.20211028	2021-10-28 12:00:56 -04:00
Joey Hess	eb95ed4863	fix addurl concurrency issue addurl: Support adding the same url to multiple files at the same time when using -J with --batch --with-files. Implementation was easier than expected, was able to reuse OnlyActionOn. While it will download the url's content multiple times, that seems like the best thing to do; see my comment for why. Sponsored-by: Dartmouth College's DANDI project	2021-10-27 16:15:41 -04:00
Joey Hess	669037862a	avoid redundant freezeContent call This opens the potential for the object file to be in place but git-annex is interrupted before it can freeze it. git-annex fsck already fixes that situation, which can also occur when lockContentForRemoval thaws content. Also improve comment to not be Windows-specific.	2021-10-27 14:18:10 -04:00
Joey Hess	0756625e1b	update, bugfix also fixed git-annex info	2021-10-27 12:22:02 -04:00
Joey Hess	b2c48fb86b	Fix using lookupkey inside a subdirectory Caused by dirContains ".." "foo" being incorrectly False. Also added a test of dirContains, which includes all the previous bug fixes I could find and some obvious cases. Reversion in version 8.20211011 Sponsored-by: Brett Eisenberg on Patreon	2021-10-26 15:00:45 -04:00
Joey Hess	5a9e6b1fd4	when private journal file exists, still read from git-annex branch Fix bug that caused stale git-annex branch information to read when annex.private or remote.name.annex-private is set. The private journal file should not prevent reading more current information from the git-annex branch, but used to. Note that, overBranchFileContents has to do additional work now, when there's a private journal file, it reads from the branch redundantly and more slowly. Sponsored-by: Jack Hill on Patreon	2021-10-26 13:43:50 -04:00
Joey Hess	2801528eb2	oops, I misread, still happens for adjusted branches	2021-10-20 13:45:56 -04:00
Joey Hess	f7b5a5c9ed	changelog A user tested `0f38ad9a69` on WSL, and it seems to have fixed the problem.	2021-10-20 13:26:01 -04:00
Joey Hess	f4bdecc4ec	improve sqlite MultiWriter handling of read after write This removes a messy caveat that was easy to forget and caused at least one bug. The price paid is that, after a write to a MultiWriter db, it has to close the db connection that it had been using to read, and open a new connection. So it might be a little bit slower. But, writes are usually batched together, so there's often only a single write, and so there should not be much of a slowdown. Notice that SingleWriter already closed the db connection after a write, so paid the same overhead. This is the second try at fixing a bug: git-annex get when run as the first git-annex command in a new repo did not populate all unlocked files. (Reversion in version 8.20210621) Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-10-19 15:13:29 -04:00
Joey Hess	29d687dce9	When retrival from a chunked remote fails, display the error that occurred when downloading the chunk Rather than the error that occurred when trying to download the unchunked content, which is less likely to actually be stored in the remote. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-10-14 12:45:05 -04:00
Joey Hess	b36cc0320e	avoid crashing tilde expansion on user who does not exist git does not crash when there's a remote configured for a user who does not exist, and this prevents git-annex from crashing too. Consider that a user might exist on one system but not another, and the git repo be moved between systems. So not crashing is desirable. Note that git fetch seems to mishandle a remote path like ~foo/bar when the user does not exist. While it does access ./~foo/bar, and gets as far as running git-upload-pack on the path, it then complains there is no such repo. So different parts of git seem to be doing different things in that edge case. Anyway, git-annex does not need to be bug-for-bug compatible with git. Sponsored-by: Jack Hill on Patreon	2021-10-13 09:16:36 -04:00
Joey Hess	c2a44eab50	move gpg tmp home to system temp dir test: Put gpg temp home directory in system temp directory, not filesystem being tested. Since I've found indications gpg can fail talking to the agent when the socket ends up on eg, fat. And to hopefully fix this bug report I've followed up on. The main risk in using the system temp dir is that TMPDIR could be set to a long directory path, which is too long to put a unix socket in. To partially amelorate that risk, it uses either an absolute or a relative path, whichever is shorter. (Hopefully gpg will not convert it to a longer form of the path..) If the user sets TMPDIR to something so long a path to it + "S.gpg-agent" is too long, I suppose that's their issue to deal with. Sponsored-by: Dartmouth College's Datalad project	2021-10-12 13:29:56 -04:00
Joey Hess	17a0fa3dbc	negotiate P2P protocol version for tor remotes This negotiation is not supported by versions of git-annex older than 6.20180312. Well, maybe really 6.20180227 or so, but using that in the changelog simplifies things since it was the version for the other changes as well. See commit `c81768d425` for the back story. As well as allowing for future protocol improvements, this will result in negoatiating protocol version 1, which is an improvement over default version 0. In fact, it looks like no supported version of git-annex will use protocol version 0, since version 1 was introduced in 6.20180227. Still, removing the code for version 0 seems unncessary. See commit `31e1adc005`. Sponsored-by: Brett Eisenberg on Patreon.	2021-10-11 15:58:51 -04:00
Joey Hess	f8816d2b92	remove list of removed commands the list was wrong and also users shouldn't need to know	2021-10-11 15:43:19 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	e28cf82b45	releasing package git-annex version 8.20211011	2021-10-11 12:53:17 -04:00
Joey Hess	022bb6174c	Merge branch 'borgchunks'	2021-10-08 13:26:45 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	1c11dd4793	avoid cursor jitter when updating progress display When the progress display gets longer, and then shorter again, it causes the cursor to jitter back and forth. Somehow I never noticed this until this morning, but then it became intolerable to watch. To fix it, pad the progress display to the maximum length it's occupied. Sponsored-by: Svenne Krap on Patreon	2021-10-07 11:16:41 -04:00
Joey Hess	45dfddd33f	convert ExportLocation to ShortByteString to avoid PINNED memory fragmentation This adds the overhead of a copy whenever converting to/from ExportLocation and ImportLocation. borg: Some improvements to memory use when importing a lot of archives. (It's still pretty bad.) Sponsored-by: Mark Reidenbach on Patreon	2021-10-05 14:51:55 -04:00
Joey Hess	9012fa0187	reinject: Fix crash when reinjecting a file from outside the repository Commit `4bf7940d6b` introduced this problem, but was otherwise doing a good thing. Problem being that fileRef "/foo" used to return ":./foo", which was actually wrong, but as long as there was no foo in the local repository, catKey could operate on it without crashing. After that fix though, fileRef would return eg "../../foo", resulting in fileRef returning ":./../../foo", which will make git cat-file crash since that's not a valid path in the repo. Fix is simply to make fileRef detect paths outside the repo and return Nothing. Then catKey can be skipped. This needed several bugfixes to dirContains as well, in previous commits. In Command.Smudge, this led to needing to check for Nothing. That case should actually never happen, because the fileoutsiderepo check will detect it earlier. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 14:06:34 -04:00
Joey Hess	b9a1cc512d	avoid uncessary call to inAnnex sync --content: Avoid a redundant checksum of a file that was incrementally verified, when used on NTFS and perhaps other filesystems. When sync has just gotten the content, it does not need to check inAnnex a second time. On NTFS, for some reason the write of the inode cache after it gets the content is not immediately able to be read, and with an empty/non-matching inode cache due to that stale data, inAnnex falls back to hashing the whole object to determine if it's present. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 12:02:35 -04:00
Joey Hess	b9aa2ce8d1	resume properly when copying a file to/from a local git remote is interrupted (take 2) This method avoids breaking test_readonly. Just check if the dest file exists, and avoid CoW probing when it does, so when CoW probing fails, it can resume where the previous non-CoW copy left off. If CoW has been probed already to work, delete the dest file since a CoW copy will presumably work. It seems like it would be almost as good to just skip CoW copying in this case too, but consider that the dest file might have started to be copied from some other remote, not using CoW, but CoW has been probed to work to copy from the current place. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:03:01 -04:00
Joey Hess	7ccf642863	revert change that broke test_readonly commit `63d508e885` broke test_readonly. When a local git remote is readonly, tryCopyCoW run to copy a file from it failed at withOtherTmp. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:02:41 -04:00
Joey Hess	9ea8106bb0	sped up git-annex smudge --clean by 25% Disabling git-annex branch update for this command is ok, because it does not use any information from the branch, but only logs the location when it adds a key. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 14:15:20 -04:00
Joey Hess	e8496d62e4	improved bwrate limiting implementation New method is much better. Avoids unrestrained transfer at the beginning (except for the first block. Keeps right at or a few kb/s below the configured limit, with very little varation in the actual reported bandwidth. Removed the /s part of the config as it's not needed. Ready to merge. Sponsored-by: Luke Shumaker on Patreon	2021-09-22 15:27:16 -04:00
Joey Hess	05a097cde8	Merge branch 'master' into bwlimit	2021-09-22 10:48:27 -04:00
Joey Hess	55b405a965	fix remote git config vs global git config order Bug fix: Git configs such as annex.verify were incorrectly overriding per-remote git configs such as remote.name.annex-verify. This dates all the way back to 2013, commit `8a5b397ac4`, where hlint apparently somehow confused me into parsing in the wrong order. Before that it was correct. Amazing noone has noticed until now. Sponsored-by: Kevin Mueller on Patreon	2021-09-22 10:41:56 -04:00
Joey Hess	63d508e885	resume properly when copying a file to/from a local git remote is interrupted Probably this fixes a reversion, but I don't know what version broke it. This does use withOtherTmp for a temp file that could be quite large. Though albeit a reflink copy that will not actually take up any space as long as the file it was copied from still exists. So if the copy cow succeeds but git-annex is interrupted just before that temp file gets renamed into the usual .git/annex/tmp/ location, there is a risk that the other temp directory ends up cluttered with a larger temp file than later. It will eventually be cleaned up, and the changes of this being a problem are small, so this seems like an acceptable thing to do. Sponsored-by: Shae Erisson on Patreon	2021-09-21 17:43:35 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	9f38ecac1e	borg: Avoid trying to extract xattrs, ACLS, and bsdflags when retrieving from a borg repository That broke restoring on linux from a borg backup made on OSX. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-09-03 12:10:14 -04:00
Joey Hess	1a586e473b	releasing package git-annex version 8.20210903	2021-09-03 12:01:12 -04:00
Joey Hess	ec12537774	defer write permissions checking in import until after copy to repo This should complete the fix started in `6329997ac4`, fixing the actual cause of the test suite failure this time. Sponsored-by: Dartmouth College's Datalad project	2021-09-02 13:45:21 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	837116ef1e	Fix support for readonly git remotes Boolean blindness oops. (Reversion in version 8.20210621) Sponsored-by: Dartmouth College's Datalad project	2021-08-30 12:34:19 -04:00
Joey Hess	a99a84f342	add: Detect when xattrs or perhaps ACLs prevent locking down a file's content And fail with an informative message. I don't think ACLs can prevent removing the write bit, but I'm not sure, so kept it mentioning them as a possibility. Should git-annex lock also check if the write bits are able to be removed? Maybe, but the case I know about with xattrs involves cp -a copying NFS xattrs, and it's the copy of the file that is the problem. So when locking a file, I guess it will not be the copy. Sponsored-by: Dartmouth College's Datalad project	2021-08-27 14:33:01 -04:00
Joey Hess	e17342b2a0	Run cp -a with --no-preserve=xattr, to avoid problems with copied xattrs Including them breaking permissions setting on some NFS servers. Sponsored-by: Dartmouth College's Datalad project	2021-08-27 13:09:34 -04:00
Joey Hess	6d4a728455	Added annex.youtube-dl-command config This can be used to run some forks of youtube-dl. Sponsored-by: Brett Eisenberg on Patreon	2021-08-27 09:44:23 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	4ed36b2634	Fix test suite failure on Windows It would be better if the Arbitrary instance avoided generating impossible filenames like "foo/c:bar", but proably this is the only place that splits the file from the directory and then uses the file without the directory.. At least on the quickcheck properties. Sponsored-by: Svenne Krap on Patreon	2021-08-24 14:03:29 -04:00
Joey Hess	f9b92c81f6	unused: Skip the refs/annex/last-index ref that git-annex recently started creating This was unlikely to cause any problem, but it is unsightly to mention normally hidden refs, and it might have done a bit of unnecessary work to check that ref. Sponsored-by: Noam Kremen on Patreon	2021-08-24 12:58:14 -04:00
Joey Hess	53744e132d	incremental verification for gitlfs and httpalso And that should be all the special remotes supporting it on linux now, except for in the odd edge case here and there. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:17:10 -04:00
Joey Hess	f5e09a1dbe	incremental verification for S3 Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:07:00 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	8613770b06	incremental verify for webdav special remote Sponsored-by: Dartmouth College's DANDI project	2021-08-16 17:29:32 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	ec82299730	status update I was wrong about S3 supporting tailVerify.	2021-08-16 15:15:32 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	7eb3742e4b	incremental verify for chunked remotes Simply feed each chunk in turn to the incremental verifier. When resuming an interrupted retrieve, it does not do incremental verification. That would need to read the file, up to the resume point, and feed it to the incremental verifier. That seems easy to get wrong. Also it would mean extra work done before the transfer can start. Which would complicate displaying progress, and would perhaps not appear to the user as if it was resuming from where it left off. Instead, in that situation, return UnVerified, and let the verification be done in a separate pass. Granted, Annex.CopyFile does manage all that, but it's not complicated by dealing with chunks too. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:42:49 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	f1176f82a5	rsync special remote: Stop displaying rsync progress, and use git-annex's own progress display Reasons are same as in commit `cee14f147a`. (It was already done when using -J.) Sponsored-by: Mark Reidenbach on Patreon	2021-08-09 12:06:10 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00
Joey Hess	899983058f	add: When adding a dotfile, avoid treating its name as an extension.	2021-08-03 12:22:58 -04:00
Joey Hess	9cae7c5bbf	releasing package git-annex version 8.20210803	2021-08-03 12:20:45 -04:00
Joey Hess	b3c4579c79	work around strange auto-init bug git-annex get when run as the first git-annex command in a new repo did not populate unlocked files. (Reversion in version 8.20210621) I am not entirely happy with this, because I don't understand how `428c91606b` caused the problem in the first place, and I don't fully understand how skipping calling scanAnnexedFiles during autoinit avoids the problem. Kept the explicit call to scanAnnexedFiles during git-annex init, so that when reconcileStaged is expensive, it can be made to run then, rather than at some later point when the information is needed. Sponsored-by: Brock Spratlen on Patreon	2021-07-30 18:36:03 -04:00
Joey Hess	66089e97de	Fix a rounding bug in display of data sizes Eg, showImprecise 1 1.99 returned "1.1" rather than "2". The 9 rounded upward to 10, and that was wrongly used as the decimal, rather than carrying the 1. Sponsored-by: Jack Hill on Patreon	2021-07-30 09:56:04 -04:00
Joey Hess	d2aead67bd	fsck: Detect and correct stale or missing inode caches for object files An easy way to see this in action is to have an unlocked file, and touch the object file. While all code that compares inode caches for object files needs to be prepared for this kind of problem and fall back to verification, having fsck notice it and correct it is cheap (as long as fsck is being run anyway) and ensures that if it happens for some unusual reason, there's a way for the user to notice that it's happening. Not that, when annex.thin is in use, the earlier call to isUnmodified (and also potentially earlier calls to inAnnex in eg, verifyLocationLog) will fix up the same problem silently. That might prevent the warning being displayed, although probably it still will be, because the Database.Keys write of the InodeCache will be queued but will not have happened yet. I can't see a way to improve this, but it's not great. Sponsored-by: Dartmouth College's Datalad project	2021-07-29 14:06:42 -04:00
Joey Hess	73e0cbbb19	fix problem populating pointer files This is a result of an audit of every use of getInodeCaches, to find places that misbehave when the annex object is not in the inode cache, despite pointer files for the same key being in the inode cache. Unfortunately, that is the case for objects that were in v7 repos that upgraded to v8. Added a note about this gotcha to getInodeCaches. Database.Keys.reconcileStaged, then annex.thin is set, would fail to populate pointer files in this situation. Changed it to check if the annex object is unmodified the same way inAnnex does, falling back to a checksum if the inode cache is not recorded. Sponsored-by: Dartmouth College's Datalad project	2021-07-27 14:26:49 -04:00
Joey Hess	3b5a3e168d	check if object is modified before starting to send it Fix bug that caused some transfers to incorrectly fail with "content changed while it was being sent", when the content was not changed. While I don't know how to reproduce the problem that several people reported, it is presumably due to the inode cache somehow being stale. So check isUnmodified', and if it's not modified, include the file's current inode cache in the set to accept, when checking for modification after the transfer. That seems like the right thing to do for another reason: The failure says the file changed while it was being sent, but if the object file was changed before the transfer started, that's wrong. So it needs to check before allowing the transfer at all if the file is modified. (Other calls to sameInodeCache or elemInodeCaches, when operating on inode caches from the database, could also be problimatic if the inode cache is somehow getting stale. This does not address such problems.) Sponsored-by: Dartmouth College's Datalad project	2021-07-26 17:33:49 -04:00
Joey Hess	3d50b47ded	sync, merge: Added --allow-unrelated-histories option Which is the same as the git merge option. After last commit, this turns out to be needed in the test suite, and when doing git-annex import from special remote, followed by a git-annex merge. Sponsored-by: Svenne Krap on Patreon	2021-07-19 12:14:26 -04:00
Joey Hess	b6bea0d3f2	remove direct mode remnant of merging unrelated histories sync, merge, post-receive: Avoid merging unrelated histories, which used to be allowed only to support direct mode repositories. (However, sync does still merge unrelated histories when importing trees from special remotes, and the assistant still merges unrelated histories always.) See `556b2ded2b` for why this was added back in 2016, for direct mode. This is a behavior change, which might break something that was relying on sync merging unrelated histories, but git had a good reason to prevent it, since it's easy to foot shoot with it, and git-annex should follow suit. Sponsored-by: Noam Kremen on Patreon	2021-07-19 11:41:26 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	c952c485c8	Fix retrieval of content from borg repos accessed over ssh It was making the borgrepo path absolute.. even when it was a ssh repository. Made BorgRepo a newtype, to guard against accidentially treating it like a FilePath. Sponsored-by: Graham Spencer on Patreon	2021-07-15 12:39:24 -04:00
Joey Hess	dd31fe7b9e	fall back to checking lower case hash directories in normal repo Fix a bug that prevented getting content from a repository that started out as a bare repository, or had annex.crippledfilesystem set, and was converted to a non-bare repository. This unfortunately means that inAnnex check gets slowed down by a stat call in normal repos when the content is not present. Oh well, such is the cost of backwards compatability with old mistakes. Sponsored-by: Mark Reidenbach on Patreon	2021-07-15 12:16:31 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	065db484e0	releasing package git-annex version 8.20210714	2021-07-14 12:23:24 -04:00
Joey Hess	8885bd3c5b	addistant: honor annex.delayadd for non-large files assistant: When adding non-large files to git, honor annex.delayadd configuration. Also, don't add non-large files to git when they are still being written to. This came for free, since the changes to non-large files get queued up with the ones to large files, and run through the lsof check. Sponsored-by: Luke Shumaker on Patreon	2021-07-13 12:17:00 -04:00
Joey Hess	a6767ca81f	close bug and mention another aspect of the reversion in changelog	2021-07-12 10:45:57 -04:00
Joey Hess	6a581f8b8b	fix init reversion when core.sharedRepository = group init: Fix misbehavior when core.sharedRepository = group that caused it to enter an adjusted branch. (Reversion in version 8.20210630) Commit `4b1b9d7a83` made init call freezeContent in case there was a hook that could prevent writing in situations where perms don't. But with the above git config, freezeContent does not prevent write at all. So init needs to do what freezeContent does with a non-shared git config. Or init could check for that config, and skip the probing, since it won't actually be preventing write to any files. But that would make init too aware if details of Annex.Perms, and also would break if the git config were changed after init. Sponsored-by: Dartmouth College's Datalad project	2021-07-12 10:15:49 -04:00
Joey Hess	b885007f0e	--debug output goes to stderr again, not stdout Reversion in version 8.20210428 Sponsored-by: Dartmouth College's Datalad project	2021-07-12 09:40:38 -04:00
Joey Hess	b9db859221	addurl: Avoid crashing when used on beegfs. Sponsored-by: Dartmouth College's DANDI project	2021-07-05 13:02:40 -04:00
Joey Hess	d2c48404a8	assistant: Avoid unncessary git repository repair In a situation where git fsck gets confused about a commit that is made while it's running. Sponsored-by: Graham Spencer on Patreon	2021-06-30 18:00:16 -04:00
Joey Hess	fd99ce6c95	releasing package git-annex version 8.20210630	2021-06-30 11:48:33 -04:00
Joey Hess	73ccf34763	closing	2021-06-30 11:47:39 -04:00
Joey Hess	6b0d732746	repair: Fix reversion in version 8.20200522 that prevented fetching missing objects from remotes In commit `dfc4e641b5` git repair was changed to use remote name, not url, when fetching. But it fetches into a temporary git repo, which doesn't have remotes configured. Oops. (In my defense, that commit was made just as covid lockdown started. But testing? Urk.) Sponsored-by: Mark Reidenbach on Patreon	2021-06-29 13:15:15 -04:00
Joey Hess	199391befe	make repair interruption safe Fixed bug that interrupting git-annex repair (or assistant) while it was fixing repository corruption would lose objects that were contained in pack files. Unpack all pack files and move objects into place before deleting the pack files. The old approach moved the pack files to a temp directory before unpacking them, which was not interruption safe. Sponsored-By: Jochen Bartl on Patreon	2021-06-29 13:14:28 -04:00
Joey Hess	b8e32e200e	addurl, importfeed: Added --no-raw option Forces eg, download with youtube-dl without falling back to raw download. Since youtube-dl failing due to an url not being supported is difficult to distinguish from it failing due to being blocked in some way, this can be useful to avoid the fallback of git-annex downloading the raw web page and adding that. Since --raw also prevents using special remotes, --no-raw also allows special remote downloads. Although it's always possible that some special remote may claim an url and fall back to raw download of the content, which --no-raw cannot prevent. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-06-27 11:14:51 -04:00
Joey Hess	3a14648142	dropping unused marks as dead Dropping an object with drop --unused or dropunused will mark it as dead, preventing fsck --all from complaining about it after it's been dropped from all repositories. If another repository still has a copy, it won't be treated as dead until it's also dropped from there. The drop has to use --unused, can't be --key or something else, because this indicates that the user has recently ran git-annex unused. If it checked the unused log on every drop, bad things would happen when the unused log was out of date, eg a file used to be unused but then got re-added. Marking such a file as dead could be confusing. When the user uses --unused/dropunused, they must consider the unused information to be up-to-date. The particular workflow this enables is: git annex add foo git annex unannex foo git annex unused git annex drop --unused / dropunused git annex fsck --all # no warnings The docs for git-annex unannex say to use git-annex unused and dropunused, so the user should be pointed in this direction when they want to undo an accidental add. Sponsored-by: Brock Spratlen on Patreon	2021-06-25 15:22:26 -04:00
Joey Hess	df2001aa88	Improve display of errors when transfers fail Transfers from or to a local git repo could fail without a reason being given, if the content failed to verify, or if the object file's stat changed while it was being copied. Now display messages in these cases. Sponsored-by: Jack Hill on Patreon	2021-06-25 13:17:04 -04:00
Joey Hess	4b1b9d7a83	Added annex.freezecontent-command and annex.thawcontent-command configs Freeze first sets the file perms, and then runs freezecontent-command. Thaw runs thawcontent-command before restoring file permissions. This is in case the freeze command prevents changing file perms, as eg setting a file immutable does. Also, changing file perms tends to mess up previously set ACLs. git-annex init's probe for crippled filesystem uses them, so if file perms don't work, but freezecontent-command manages to prevent write to a file, it won't treat the filesystem as crippled. When the the filesystem has been probed as crippled, the hooks are not used, because there seems to be no point then; git-annex won't be relying on locking annex objects down. Also, this avoids them being run when the file perms have not been changed, in case they somehow rely on git-annex's setting of the file perms in order to work. Sponsored-by: Dartmouth College's Datalad project	2021-06-21 14:40:52 -04:00
Joey Hess	1cc7b2661e	push synced/master before synced/git-annex sync: Partly work around github behavior that first branch to be pushed to a new repository is assumed to be the head branch, by not pushing synced/git-annex first. github expects master (or whatever the name is) to be pushed first, but git-annex sync can't, because it's got to also support pushes to non-bare repos where pushing master fails, as explained in the big comment. So pushing synced/master is not entirely a fix, but at least it makes github default to a branch with the stuff the user expects in it, not a bunch of annex log files. Aside from fixing github to not make this assumption, or improving the git push protocol to include what the current HEAD is, the only other approach I can think of is to identify git push's progress messages and display those when pushing master, while filtering out error messages about non-fast-forward etc. But git doesn't provide a way to separate out or identify its progress messages. Sponsored-by: Luke Shumaker on Patreon	2021-06-21 12:32:21 -04:00
Joey Hess	a6e281e008	releasing package git-annex version 8.20210621	2021-06-21 12:17:46 -04:00
Joey Hess	d2be68907c	drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies Eg, before with a .gitattributes like: .2 annex.numcopies=2 .1 annex.numcopies=1 And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2 would succeed, leaving just 1 copy, despite foo.2 needing 2 copies. It dropped foo.1 first and then skipped foo.2 since its content was gone. Now that the keys database includes locked files, this longstanding wart can be fixed. Sponsored-by: Noam Kremen on Patreon	2021-06-15 11:38:44 -04:00
Joey Hess	af9fdf5dba	verify associated files when checking numcopies Most of this is just refactoring. But, handleDropsFrom did not verify that associated files from the keys db were still accurate, and has now been fixed to. A minor improvement to this would be to avoid calling catKeyFile twice on the same file, when getting the numcopies and mincopies value, in the common case where the same file has the highest value for both. But, it avoids checking every associated file, so it will scale well to lots of dups already. Sponsored-by: Kevin Mueller on Patreon	2021-06-15 11:14:52 -04:00
Joey Hess	3af4c9a29a	fix exponential blowup when adding lots of identical files This was an old problem when the files were being added unlocked, so the changelog mentions that being fixed. However, recently it's also affected locked files. The fix for locked files is kind of stupidly simple. moveAnnex already handles populating unlocked files, and only does it when the object file was not already present. So remove the redundant populateUnlockedFiles call. (That call was added all the way back in `cfaac52b88`, and has always been unncessary.) Sponsored-by: Dartmouth College's Datalad project	2021-06-15 09:45:55 -04:00
Joey Hess	78da00c7a6	Future proof activity log parsing When the log has an activity that is not known, eg added by a future version of git-annex, it used to be treated as no activity at all, which would make git-annex expire think it should expire the repository, despite it having some kind of recent activity. Hopefully there will be no reason to add a new activity until enough time has passed that this commit is in use everywhere. Sponsored-by: Jake Vosloo on Patreon	2021-06-14 14:18:19 -04:00
Joey Hess	771a122c9e	add --size-limit option When this option is not used, there should be effectively no added overhead, thanks to the optimisation in `b3cd0cc6ba`. When an action fails on a file, the size of the file still counts toward the size limit. This was necessary to support concurrency, but also generally seems like the right choice. Most commands that operate on annexed files support the option. export and import do not, and I don't know if it would make sense for export to.. Why would you want an incomplete export? sync doesn't, and while it would be easy to make it support it for transferring files, it's not clear if dropping files should also take the size limit into account. Commands like add that don't operate on annexed files don't support the option either. Exiting 101 not yet implemented. Sponsored-by: Denis Dzyubenko on Patreon	2021-06-04 16:16:53 -04:00
Joey Hess	189fb05ffb	Added annex.adviceNoSshCaching config. Sponsored-by: Brock Spratlen on Patreon	2021-05-27 12:37:49 -04:00

1 2 3 4 5 ...

1335 commits