git-annex

Author	SHA1	Message	Date
Joey Hess	4a387eda54	fix oops	2021-03-26 14:37:03 -04:00
Joey Hess	f085ae4937	borg: Support importing files that are hard linked in the borg backup Note that a key with no size field that is hard linked will result in listImportableContents reporting a file size of 0, rather than the actual size of the file. One result is that the progress meter when getting the file will seem to get stuck at 100%. Another is that the remote's preferred content expression, if it tries to match against file size, will treat it as an empty file. I don't see a way to improve the latter behavior, and the former behavior is a minor enough problem. This commit was sponsored by Jake Vosloo on Patreon.	2021-03-26 13:29:34 -04:00
Joey Hess	31eb5fddf3	borg: Fix a bug that prevented importing keys of type URL and WORM Keys stored on the filesystem are mangled by keyFile to avoid problem chars. So, that mangling has to be reversed when parsing files from a borg backup back to a key. The directory special remote also so mangles them. Some other special remotes do not; eg S3 just serializes the key -- but S3 object names are not limited to filesystem valid filenames anyway, so a S3 server must not map them directly to files in any case. It seems unlikely that a borg backup of some such special remote will get broken by this change. This commit was sponsored by Graham Spencer on Patreon.	2021-03-26 12:07:00 -04:00
Joey Hess	537f9d9a11	Improved display of errors when accessing a git http remote fails. New error message: Remote foo not usable by git-annex; setting annex-ignore http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1 If git config parse fails, or the git config file is not available at the url, a better error message for that is also shown. This commit was sponsored by Mark Reidenbach on Patreon.	2021-03-24 14:19:32 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	5d75cbcdcf	webdav: deal with buggy webdav servers in renameExport box.com already had a special case, since its renaming was known buggy. In its case, renaming to the temp file succeeds, but then renaming the temp file to final destination fails. Then this 4shared server has buggy handling of renames across directories. While already worked around with for the temp files when storing exports now being in the same directory as the final filename, that also affected renameExport when the file moves between directories. I'm not entirely clear what happens on the 4shared server when it fails this way. It kind of looks like it may rename the file to destination and then still fail. To handle both, when rename fails, delete both the source and the destination, and fall back to uploading the content again. In the box.com case, the temp file is the source, and deleting it makes sure the temp file gets cleaned up. In the 4shared case, the file may have been renamed to the destination and so cleaning that up avoids any interference with the re-upload to the destination.	2021-03-22 13:08:18 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	3337f7c272	fix exporting when the file is in the top of the repo takeDirectory "foo" is ".", and that will confuse webdav, so only use that code path when there is a subdirectory.	2021-03-16 14:17:29 -04:00
Joey Hess	3e41a8f032	move move use of </> into DavLocation so it always uses unix filepaths due to imports	2021-03-12 15:19:11 -04:00
Joey Hess	4f49c29d20	webdav: store temp file in same collection as the final export location This may work better in some webdav server that gets confused at cross-collection renamed. I don't know, let's find out. The only real downside of doing this is that the temp files are not all in the top-level collection, in case an interrupted run leaves one behind. But that does not seem especially significant.	2021-03-12 14:52:24 -04:00
Joey Hess	1d7fa63149	Added support for git-remote-gcrypt's rsync URIs Which access a remote using rsync over ssh, and which git pushes to much more efficiently than ssh urls. There was some old partial support for rsync URIs from 2013, but it seemed incomplete, and did not use rsync over ssh. Weird. I'm not sure if there's any remaining benefit to using the non-rsync url forms with gcrypt, now that this is implemented? Updated docs to encourage using the rsync urls. This commit was sponsored by Svenne Krap on Patreon.	2021-03-09 15:58:09 -04:00
Joey Hess	6940d4ad40	chance case of "transfer failed" to match "Transfer stalled"	2021-03-06 17:47:05 -04:00
Joey Hess	381f203d1a	refactor Avoiding using a callback simplifies this and should make it easier to implement incremental checksumming, which will need to happen partly in writeRetrievedContent and partly in retrieveChunks.	2021-02-16 16:03:28 -04:00
Joey Hess	48310f2d55	windows build fix from jwodder	2021-02-15 13:35:01 -04:00
Joey Hess	f44d4704c6	incremental checksum for local remotes This benchmarks only slightly faster than the old git-annex. Eg, for a 1 gb file, 14.56s vs 15.57s. (On a ram disk; there would certianly be more of an effect if the file was written to disk and didn't stay in cache.) Commenting out the updateIncremental calls make the same run in 6.31s. May be that overhead in the implementation, other than the actual checksumming, is slowing it down. Eg, MVar access. (I also tried using 10x larger chunks, which did not change the speed.)	2021-02-10 16:05:24 -04:00
Joey Hess	48f63c2798	stop using rsync in fileCopier This is groundwork for calculating checksums while copying, rather than in a separate pass, but that's not done yet. For now, avoid using rsync (and cp on Windows), and instead read and write the file ourselves, with resume handling. Benchmarking vs old git-annex that used rsync, this is faster, at least once the file size is larger than a couple of MB.	2021-02-10 14:44:35 -04:00
Joey Hess	c4c9b99e22	refactoring	2021-02-10 13:38:45 -04:00
Joey Hess	e24ddb8946	Bugfix: fsck --from a ssh remote did not actually check that the content on the remote is not corrupted Changing to the P2P protocol broke this, because preseedTmp copies the local copy of the object to the temp file, and then the P2P transfer sees the right length file and uses it as-is. When git-annex-shell is too old and rsync is used, it did verify the content, and when the local repo does not have the object it did verify the content.	2021-02-10 13:29:12 -04:00
Joey Hess	1c75364eac	fix missing call to check after hard linking This could perhaps have caused a hard link to be made when the content of the object was modified. I don't think that actually happened, because the annexed file would have to be unlocked, with annex.thin, for the object to get modified, and in that case, a hard link is not made. However, to be sure, run the check. Note that it seemed best to run the check only once, although the current implementation is fast and safe to run repeatedly.	2021-02-10 13:07:38 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	fa3d71d924	Tahoe: Avoid verifying hash after download, since tahoe does sufficient verification itself See my comment in the next commit for some details about why Verified needs a hash with preimage resistance. As far as tahoe goes, it's fully cryptographically secure. I think that bup could also return Verified. However, the Retriever interface does not currenly support that.	2021-02-09 13:42:16 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	1b63132ca3	add searchPathContents And rename related functions for consistency.	2021-02-02 19:06:15 -04:00
Joey Hess	b372d962ae	Added GETGITREMOTENAME to extenal special remote protocol	2021-01-26 12:42:47 -04:00
Joey Hess	b63e3118d7	fix export overwrite on FAT Don't accept the cid of the temp file that the content has just been written to as something we will accept if another file has that same content. There's no reason to, and on FAT, due to mtime resolution, the test suite hit just such a case. This fixes a reversion from `73df633a62` which removed inode from the ContentIdentifier.	2021-01-25 13:31:17 -04:00
Joey Hess	73df633a62	omit inode from ContentIdentifier for directory special remote Directory special remotes with importtree=yes now avoid unncessary overhead when inodes of files have changed, as happens whenever a FAT filesystem gets remounted. A few unusual edge cases of modifications won't be detected and imported. I think they're unusual enough not to be a concern. It would be possible to add a config setting that controls whether to compare inodes too, but does not seem worth bothering the user about currently. I chose to continue to use the InodeCache serialization, just with the inode zeroed. This way, if I later change my mind or make it configurable, can parse it back to an InodeCache and operate on it. The overhead of storing a 0 in the content identifier log seems worth it. There is a one-time cost to this change; all directory special remotes with importtree=yes will re-hash all files once, and will update the content identifier logs with zeroed inodes. This commit was sponsored by Brett Eisenberg on Patreon.	2021-01-19 13:15:07 -04:00
Joey Hess	e7134ca1eb	avoid partial functions in Git.Url After the last commit, it was able to throw errors just due to an unparseable url. This avoids needing to worry about that, as long as the call site has already checked that it has a parseable url.	2021-01-18 15:07:23 -04:00
Joey Hess	2aa4fab62a	avoid crashing when there are remotes using unparseable urls Including the non-standard URI form that git-remote-gcrypt uses for rsync. Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port number. But git could have a remote with that, it would try to run git-remote-ook to handle it. So, git-annex has to allow for such things, rather than crashing. This commit was sponsored by Luke Shumaker on Patreon.	2021-01-18 14:59:08 -04:00
Joey Hess	c8b1fa67b4	Behavior change: --trust-glacier option no longer overrides trust Since that can lead to data loss, which should never be enabled by an option other than --force. This commit was sponsored by Jake Vosloo on Patreon.	2021-01-07 10:37:43 -04:00
Joey Hess	e10855e723	don't support dropping from thirdPartyPopulated for now This code I'm reverting works. But it has a problem: The export db and log and the ContentIdentifier db and log still list the content as being stored in the remote. So when I ran borg create again and stored the content in borg again in a new archive, git-annex sync noticed that, but since it didn't update the tree for the old archives, it then thought the content that had been removed from them was still in them, and so git-annex get failed in an ugly way: Include pattern 'tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' never matched. [2020-12-28 16:40:44.878952393] process [933616] done ExitFailure 1 user error (borg ["extract","/tmp/b::abs4","tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"] exited 1) It does not seem worth it to update the git tree for the export when dropping content, that would make drop of many files very expensive in git tree objects created. So, let's not support this I suppose..	2020-12-28 16:48:38 -04:00
Joey Hess	69d4b84501	support removing objects from borg	2020-12-28 16:36:52 -04:00
Joey Hess	bfdaee234f	support removal from thirdPartyPopulated also some other fixes to thirdPartyPopulated	2020-12-28 16:36:26 -04:00
Joey Hess	b16e6fb4e6	borg appendonly config	2020-12-28 16:23:38 -04:00
Joey Hess	0990d74574	revert recent lockContent change lockContent should only be done when it's versioned	2020-12-28 16:05:14 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	5ce7fce74a	simplify adjustExportImport' is never called with both isexport and isimport False.	2020-12-28 15:06:47 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	2e72590a48	avoid using export method when the remote only supports import	2020-12-23 13:40:56 -04:00
Joey Hess	e3d356fe84	borg: add subdir= config Note that, after changing it with enableremote, syncing won't rescan known archives in the borg repo using the changed config. Probably not a problem? Also used File in some places where filenames that could theoretically start with - are passed to borg, to avoid it confusing them with options.	2020-12-23 13:12:11 -04:00
Joey Hess	4254e2297d	implement retrieveExportWithContentIdentifier Moved out an XXX to a todo This seems about ready to merge..	2020-12-22 16:16:48 -04:00
Joey Hess	a9d639c5b5	borg can prompt	2020-12-22 15:48:17 -04:00
Joey Hess	df4942e179	notice when an archive that was seen before gets deleted	2020-12-22 15:45:06 -04:00
Joey Hess	523b7143e0	implemented checkPresentExportWithContentIdentifier	2020-12-22 15:34:41 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	5d8e4a7c74	avoid borg list of archives that have been listed before This makes sync a lot faster in the common case where there's no new backup. There's still room for it to be faster. Currently the old imported tree has to be traversed, to generate the ImportableContents. Which then gets turned around to generate the new imported tree, which is identical. So, it would be possible to just return a "no new imports", or an ImportableContents that has a way to graft in a tree. The latter is probably too far to go to optimise this, unless other things need it. The former might be worth it, but it's already pretty fast, since git ls-tree is pretty fast.	2020-12-22 14:06:40 -04:00
Joey Hess	7f7094a7cb	include borg archive name in tree, use empty ContentIdentifier It's unusual to use a ContentIdentifier that is not semi-unique for different contents. Note that in importKeys, it checks if a content identifier is one that's known before, to avoid downloading the same content twice. But that's done in a code path not used for borg repos, because they are thirdpartypopulated.	2020-12-22 11:53:00 -04:00
Joey Hess	bcd55b365c	import from borg is basically working Still some issues to deal with, see TODO and XXX. Here's what gets logged, for each key: cid log: 1608582045.832799227s 6720ebad-b20e-4460-a8f2-2477361aea75 !MjAyMC0xMi0yMVQxMTozMzoxNw==:!MjAyMC0xMi0yMVQxMzowNzoyNg== The "!Mj" are base64 encoded borg archive names, since mine were dates and contained some characters not allowed in cid logs unescaped. There were archives that each contained the key. This list will grow as more borg backups are done and learned about. tree generated: 120000 blob 5ef6a4615c084819b44cd4e3a31657664ddf643b x/dotgit/annex/objects/06/mv/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da 120000 blob 063a139d3021c8db60f5c576d29fada2b824d91c x/dotgit/annex/objects/72/PP/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893 120000 blob b53b54916fd6abf21fedf796deca08d5ac7a75af x/dotgit/annex/objects/Ww/pk/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2 This commit was sponsored by Denis Dzyubenko on Patreon.	2020-12-21 16:37:55 -04:00
Joey Hess	15000dee07	improve thirdpartypopulated support May actually work now. Note that, importKey now has to add the size to the key if it's supposed to have size. Remote.Directory relied on the importer adding the size, which is no longer done, so it was changed; it was the only one. This way, importKey does not need to behave differently between regular and thirdpartypopulated imports.	2020-12-21 16:19:44 -04:00

1 2 3 4 5 ...

1383 commits