Commit graph

2496 commits

Author SHA1 Message Date
Joey Hess
0547884eb2
importfeed: fix bug while also speeding up 12x!
* Fix bug that could make git-annex importfeed not see recently recorded
  state when configured with annex.alwayscommit=false.
* importfeed: Made "checking known urls" phase run 12 times faster.

The massive speedup is because it no longer queries for metadata
accompanying each url. Instead it processes the whole git-annex branch and
checks all metadata files for feed item ids, and uses any it finds.

This could result in a behavior change, in an unlikely situation: If a feed
id is recorded in a key's metadata, but the url gets removed, the old code
would not see that item id and would re-download it if it finds an url for
it in a feed, while the new code will see the item id. I don't think
the old behavior was intentional, and it may be that the new behavior is
better. Not gonna worry about this.
2021-04-23 12:36:56 -04:00
Joey Hess
b689f17062
refactoring 2021-04-23 11:44:10 -04:00
Joey Hess
da0a696c96
Revert "reorder another test"
This reverts commit 3e63f00f63.
2021-04-23 01:01:29 -04:00
Joey Hess
3e63f00f63
reorder another test
continuing to try to narrow down cause of failure on windows
2021-04-22 10:03:35 -04:00
Joey Hess
9b870e29fd
Merge branch 'master' into hiddenannex 2021-04-21 13:04:40 -04:00
Joey Hess
39d94919cd
reorder tests debugging windows failure
This order will work just as well, so no need to revert this change
later.
2021-04-21 13:01:41 -04:00
Joey Hess
05989556a2
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.

Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.

All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.

Currently, no UUIDs are treated as private yet, a way to configure that
is needed.

The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.

It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.

And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 15:04:53 -04:00
Joey Hess
5783a8d081
fsck: avoid redundant checksum when transfer is Verified
When downloading content from a remote, if the content is able to be
verified during the transfer, skip checksumming it a second time.

Note that in this case, the fsck output does not include "(checksum)"
which it does when the checksumming is done separately from the download.

This commit was sponsored by Brock Spratlen on Patreon.
2021-04-14 13:22:54 -04:00
Joey Hess
805d325a8d
diffdriver: Support unlocked files 2021-04-08 14:32:09 -04:00
Joey Hess
13c090b37a
use fastDebug everywhere it can be used
None of these are likely to yeild a noticable speedup though.
2021-04-06 15:41:24 -04:00
Joey Hess
1b645e1ace
added --debugfilter (and annex.debugfilter) 2021-04-05 15:31:10 -04:00
Joey Hess
aaba83795b
switch from hslogger to purpose-built Utility.Debug
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.

The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.

Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
2021-04-05 13:40:31 -04:00
Joey Hess
c2f612292a
start splitting out readonly values from AnnexState
Values in AnnexRead can be read more efficiently, without MVar overhead.
Only a few things have been moved into there, and the performance
increase so far is not likely to be noticable.

This is groundwork for putting more stuff in there, particularly a value
that indicates if debugging is enabled.

The obvious next step is to change option parsing to not run in the
Annex monad to set values in AnnexState, and instead return a pure value
that gets stored in AnnexRead.
2021-04-02 15:51:44 -04:00
Joey Hess
a8b837aaef
add git ls-tree --long parser
Not yet used, but allows getting the size of items in the tree fairly
cheaply.

I noticed that CmdLine.Seek uses ls-tree and the feeds the files into
another long-running process to check their size. That would be an
example of a place that might be sped up by using this. Although in that
particular case, it only needs to know the size of unlocked files, not
locked. And since enabling --long probably doubles the ls-tree runtime
or more, the overhead of using it there may outwweigh the benefit.
2021-03-23 12:47:00 -04:00
Joey Hess
c68ba7d893
whereis: Don't include yt: prefix when showing url to content retrieved with youtube-dl
I don't think this was really intentional behavior. It may be that it was
useful to include it so it could be passed to rmurl, since without it rmurl
would not actually remove the url. Since that was changed earlier today,
now seems like a good time to clean up the display of these urls.

This commit was sponsored by Jochen Bartl on Patreon.
2021-03-22 19:56:24 -04:00
Joey Hess
637229c593
fix fsck --from --all to not fall over trying to check required content
fsck: When --from is used in combination with --all or similar options, do
not verify required content, which can't be checked properly when operating
on keys.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2021-03-22 15:08:07 -04:00
Joey Hess
0af9d1dcb6
unregisterurl: remove all forms of an url, no matter what the downloader is set to
unregisterurl: Fix a bug that caused an url to not be unregistered when it
is claimed by a special remote other than the web.

See commit f175d4cc90 for rationalle.
2021-03-22 12:17:17 -04:00
Joey Hess
f175d4cc90
rmurl: remove all forms of an url, no matter what the downloader is set to
* rmurl: When youtube-dl was used for an url, it no longer needs to be
  prefixed with "yt:" in order to be removed.
* rmurl: If an url is both used by the web and also claimed by another
  special remote, fix a bug that caused the url to to not be removed.

The youtube-dl change is a consequence of how the bug fix is implemented.
But I also think it's the right thing to do. Consider that, before,
git-annex addurl $url followed by git-annex rmurl $url would not remove the
url in the case where youtube-dl was used. That was surprising behavior.

In the unlikely case where a special remote claims an url, and it's been
added using OtherDownloader, but it was also added already as a web url,
it seems better for rmurl to remove both than to arbitrarily remove only one.

And in the case the bug report was filed for, when an url was added as a
web url, but a special remote now claims it, that should not prevent rmurl
removing the web url.

Calling setUrlMissing lets other callers of it behave differently.
Probably the calls to it in eg, Remote.External and Remote.BitTorrent are
fine, since they don't mangle the url and just remove what was provided,
and the OtherDownloader form of a bittorrent url, respectively.
I suspect unregisterurl needs to have a similar change made to rmurl, for
similar reasons.
2021-03-22 12:09:15 -04:00
Joey Hess
8bae692486
better interface for catKey'
It only needs the size, so don't require the other stuff. Should let it
be used in more places, making things faster.
2021-03-16 14:52:23 -04:00
Joey Hess
6481991208
export --json: Fill in the file field
Like import was using ActionItemWorkTreeFile, it's ok to use it for export,
even though it might not correspond with a file in the work tree.
And renamed it to ActionItemTreeFile to make that clearer.

Note that when an export has to rename files, it still uses
ActionItemOther, so file will still be null in that case, but as no file is
being transferred, that seems ok.
2021-03-12 14:11:31 -04:00
Joey Hess
1cb154f457
avoid importing deleting submodule
import: When the previously exported tree contained a submodule,
preserve it in the imported tree so it does not get deleted.

The export exclude log, which was used for non-preferred content,
now also includes the submodules. Since the log format is git ls-tree
output, this does not break backwards compatibility.
2021-03-12 13:31:21 -04:00
Joey Hess
f2a425bd92
export: When a submodule is in the tree to be exported, skip it. 2021-03-12 12:29:18 -04:00
Joey Hess
4fc5dbc942
update comment 2021-03-11 12:03:36 -04:00
Joey Hess
cdd512cd9f
simplify 2021-03-05 14:22:04 -04:00
Joey Hess
1b041f5c51
avoid logging location of GIT keys
It's not necessary to log location of GIT keys, because these files are
not annexed files and so git-annex will never need to get them.

This corresponds to code in Annex.Import that already checked before
updating the location log when handling deleted files.

Older versions of git-annex that used SHA1 keys for non-annexed files
also unncessarily updated the location log for them.

GIT keys still appear in the git-annex branch for content identifier
logs, so kept the documentation of them in backends.mdwn

This commit was sponsored by Jake Vosloo on Patreon.
2021-03-05 14:12:34 -04:00
Joey Hess
fc61915230
use GIT keys for export of non-annexed files
This solves the problem that import of such files gets confused and
converts them back to annexed files.

The import code already used GIT keys internally when it determined a
file should not be annexed. So now when it sees a GIT key that export
used, it already does the right thing.

This also means that even older version of git-annex can import and will
do the right thing, once a fixed version has exported. Still, there may
be other complications around upgrades; still need to think it all
through.

Moved gitShaKey and keyGitSha from Key to Annex.Export since they're
only used for export/import.

Documented GIT keys in backends, since they do appear in the git-annex
branch now.

This commit was sponsored by Graham Spencer on Patreon.
2021-03-05 14:12:11 -04:00
Joey Hess
a14001785e
fix --branch combined with --unlocked or --locked
Since it's using git ls-tree anyway, can just look at the file modes to see
if they're unlocked or are symlinks.
2021-03-02 13:47:27 -04:00
Joey Hess
cbf94fd13d
prep for fixing find --branch --unlocked
Added LinkType to ProvidedInfo, and unified MatchingKey with
ProvidedInfo. They're both used in the same way, so there was no real
reason to keep separate.

Note that addLocked and addUnlocked still set matchNeedsFileName,
because to handle MatchingFile, they do need it. However, they
don't use it when MatchingInfo is provided. This should be ok,
the --branch case will be able skip checking matchNeedsFileName,
since it will provide a filename in any case.
2021-03-02 13:39:31 -04:00
Joey Hess
ee4fd38ecf
remove unused contentFile = Nothing 2021-03-01 16:35:38 -04:00
Joey Hess
eb594c710e
unregisterurl: New command
Implemented by generalizing registerurl. Without the implicit batch mode
of registerurl since that is only a backwards compatability thing
(see commit 1d1054faa6).
2021-03-01 14:28:24 -04:00
Joey Hess
97ae474585
registerurl: Allow it to be used in a bare repository. 2021-03-01 14:03:03 -04:00
Joey Hess
a8b627d82b
uninit: Fix a small bug that left a lock file in .git/annex
unannex using git queue caused the queue lock to be taken after uninit had
cleaned out .git/annex. Flush the queue earlier to avoid.
2021-03-01 13:05:47 -04:00
Joey Hess
530e96b80e
fix unannex data overwrite bug
unannex, uninit: When an annexed file is modified, don't overwrite the
modified version with an older version from the annex

This commit was sponsored by Mark Reidenbach on Patreon.
2021-02-22 13:35:00 -04:00
Joey Hess
62d5a73bdd
unannex, uninit: Avoid running git rm once per annexed file, for a large speedup. 2021-02-22 12:56:11 -04:00
Joey Hess
3a66cd715f
avoid making absolute git remote path relative
When a git remote is configured with an absolute path, use that path,
rather than making it relative. If it's configured with a relative path,
use that.

Git.Construct.fromPath changed to preserve the path as-is,
rather than making it absolute. And Annex.new changed to not
convert the path to relative. Instead, Git.CurrentRepo.get
generates a relative path.

A few things that used fromAbsPath unncessarily were changed in passing to
use fromPath instead. I'm seeing fromAbsPath as a security check,
while before it was being used in some cases when the path was
known absolute already. It may be that fromAbsPath is not really needed,
but only git-annex-shell uses it now, and I'm not 100% sure that there's
not some input that would cause a relative path to be used, opening a
security hole, without the security check. So left it as-is.

Test suite passes and strace shows the configured remote url is used
unchanged in the path into it. I can't be 100% sure there's not some code
somewhere that takes an absolute path to the repo and converts it to
relative and uses it, but it seems pretty unlikely that the code paths used
for a git remote would call such code. One place I know of is gitAnnexLink,
but I'm pretty sure that git remotes never deal with annex symlinks. If
that did get called, it generates a path relative to cwd, which would have
been wrong before this change as well, when operating on a remote.
2021-02-08 13:18:01 -04:00
Joey Hess
dd39e9e255
suggest when user may want annex.stalldetection
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.

Note that the progress meter display is not taken down when displaying
the message, so it will display like this:

	0%    8 B                 0 B/s
	  Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
	0%    10 B                0 B/s

Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.

Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.

A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.

This commit was sponsored by Kevin Mueller on Patreon.
2021-02-03 15:57:19 -04:00
Joey Hess
1b63132ca3
add searchPathContents
And rename related functions for consistency.
2021-02-02 19:06:15 -04:00
Joey Hess
8d4eb2d34e
get: Improve output when failing to get a file fails
showTriedRemotes lists the remotes it tried to access. So there's
no need to list those again in "Try making some of these remotes
available".
2021-01-29 15:11:19 -04:00
Joey Hess
6f78497572
When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked
Missed this when implementing it because of the default case catching
the new constructor. So, removed that default case to make sure
future types of adjusted branches don't make the same mistake.

Complicated by git-annex addurl --fast which adds the file whose content
is not present, so it needs to stay unlocked when on such a branch.

This commit was sponsored by Brock Spratlen on Patreon.
2021-01-28 12:47:46 -04:00
Joey Hess
d4aac64282
fix breakage caused by recent commit
34a535ebea broke the test suite.
Getting a file started failing in one case, because the annex object did
not have its inode cached, so was not trusted to be unmodified.

This adds something very similar to what was added to linkAnnex
in commit 2e9341a47d -- if there are not
yet any inodes cached for a key, add the inode of the annex object when
adding the inode of the unlocked file.

Feels like this should be handled in a more principled way. How
do we know the addInodeCaches call in getMoveRaceRecovery just above
this change is currently correct? It doesn't add the annex object inode
cache. Ah well, maybe sometime when I've not had my entire evening eaten
by a reversion that the test suite caught as I was cooking dinner.
2021-01-25 21:22:18 -04:00
Joey Hess
47338bf270
support modifying and running git add on an unlocked file that used an URL key
Avoids the smudge --clean filter failing because URL keys do not support
genKey. Instead the modified content will be added using the default
backend.

This commit was sponsored by Jochen Bartl on Patreon.
2021-01-25 17:37:16 -04:00
Joey Hess
34a535ebea
adjust: Fix some bad behavior when unlocked files use URL keys.
This avoids the smudge --clean filter failing on the URL keys.

git checkout runs the post-checkout hook, which runs smudge --update.
That populates all the pointer files, but it neglected to store their inode
caches in the keys db. With that done, and the keys db flushed before
smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap
check can tell the file is not modified, so will not try to re-ingest it,
which does not work with URL keys because they do not support genKey.

It also seems possible that the isUnmodifiedCheap was also failing for
non-URL keys, which would cause them to be re-ingested, leading to a lot of
extra work. I have not verified that, but don't see why it wouldn't have
happened. So this probably also speeds up checking out adjusted branches.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2021-01-25 17:25:42 -04:00
Joey Hess
6a30d04ece
Bug fix: export with -J could fail when two files had the same content.
Exporting is done inside a call to writeLockDbWhile which guarantees there
is only one process uploading to a given ExportLocation.
2021-01-13 14:50:48 -04:00
Joey Hess
09b0562ec3
test: avoid unnecessary tests of variants of git remote
Configuring chunking and encryption for a git remote has no effect, so
skip testing those variants in the TestRemote call.

It would be better if TestRemote itself could do this, but it
doesn't seem possible there. There is no way to look at a Remote and
tell if it supports chunking or encryption.

Note that, while the test suite displays output as it it's testing
exporting, it actually skips doing anything for the tests when run on
the git remote. So at least does not waste time even though the output
is not ideal.

This commit was sponsored by Noam Kremen on Patreon.
2021-01-11 13:43:55 -04:00
Joey Hess
8db09feeba
fix format of message
newlines are eaten
2021-01-11 13:14:09 -04:00
Joey Hess
6a0030a110
Behavior change: git-annex trust now needs --force
Since unconsidered use of trusted repositories can lead to data loss.

Trusted has always been this way, but it used to be acceptable for
git-annex to be set up so that data could be lost without using --force,
and most or all other ways that can happen have already been eliminated.

This commit was sponsored by Mark Reidenbach on Patreon.
2021-01-07 10:09:39 -04:00
Joey Hess
cc89699457
mincopies
This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.

Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.

This commit was sponsored by Denis Dzyubenko on Patreon.
2021-01-06 14:15:19 -04:00
Joey Hess
5ce61c6b2a
add: Significantly speed up adding lots of non-large files to git
* add: Significantly speed up adding lots of non-large files to git,
  by disabling the annex smudge filter when running git add.
* add --force-small: Run git add rather than updating the index itself,
  so any other smudge filters than the annex one that may be enabled will
  be used.
2021-01-04 13:12:28 -04:00
Joey Hess
1c5fc8f047
Git.Queue: allow providing git common options like -c 2021-01-04 12:51:55 -04:00
Joey Hess
46059ab0e5
split off versionedExport from appendonly
S3 uses versionedExport, while GitLFS uses appendonly.

This is groundwork for later changes.
2020-12-28 14:37:15 -04:00
Joey Hess
6280af2901
generate more compact git-annex branch for imports
Especially from borg, where the content identifier logs
all end up being the same identical file!

But also, for other imports, the location tracking logs can,
in some cases, be identical files.

Bonus optimisation: Avoid looking up (and parsing when set)
GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to.
Although the lookup does happen at startup even when no
log will be written now.
2020-12-23 15:25:16 -04:00
Joey Hess
7916fc98a3
graft in imported tree to avoid gc
Fix a bug that could prevent getting files from an importtree=yes remote,
because the imported tree was allowed to be garbage collected.
2020-12-23 14:27:38 -04:00
Joey Hess
1574972ba9
make sync --content get from third-party populated remotes like borg 2020-12-23 12:10:39 -04:00
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
15000dee07
improve thirdpartypopulated support
May actually work now.

Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.
2020-12-21 16:19:44 -04:00
Joey Hess
57b03630b3
support thirdPartyPopulated
These don't have importTree in their config, because they don't support
tree import, but they do still support import, and do not support export
or key/value modification.
2020-12-21 13:49:47 -04:00
Joey Hess
771b6c64f0
Merge branch 'master' into borg 2020-12-18 16:05:09 -04:00
Joey Hess
e0062c4f93
build fix 2020-12-18 16:04:56 -04:00
Joey Hess
909318dcee
Merge branch 'master' into borg 2020-12-18 15:27:24 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
f62aee0525
fix handling of importtree-only remotes
Don't want to try to use these remotes as key/value remotes, which will
surely fail. It only recently became possible for importtree to be set
w/o exporttree, so before this code was ok.

(cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)
2020-12-18 15:13:30 -04:00
Joey Hess
53fd1564b1
improve synopsis 2020-12-17 12:51:49 -04:00
Joey Hess
2abda21123
update 2020-12-15 16:35:06 -04:00
Joey Hess
f29d49d478
check Remote.hasKeyCheap again
In cd1676d604, it stopped using that to avoid surprising behavior
when the location log and remote content were out of sync.
But, it seems that may have changed some behavior users relied on as
well, and also Remote.hasKeyCheap should be faster than checking then
location log.

So, try Remote.hasKeyCheap first, and only if it does not have the key,
fall back to checking the location log. If the location log still thinks
it's present, go ahead and try to get it, so the user will see a failure
rather than silently skipping a file what whereis says is on the remote.

This does make slightly slower the case where the remote does not have
the key, and location log and Remote.hasKeyCheap agree, since it now
checks both. But only 1 stat slower.
2020-12-15 14:44:00 -04:00
Joey Hess
00526a6739
pass along -c options to child git-annex processes 2020-12-15 10:49:29 -04:00
Joey Hess
ed68a2166d
importfeed: Avoid using youtube-dl when a feed does not contain an enclosure, but only a link to an url which youtube-dl does not support
This is common in some feeds, which might mix some items with enclosures,
with others that link to posts or whatever. Before this, it would try to
use youtube-dl and fail, or if youtube-dl was not allowed, it would
incorrectly complain that an url was supported by youtube-dl.
2020-12-15 01:13:21 -04:00
Joey Hess
01527b21d8
add key to FileInfo
MatchingKey is not the thing to use when matching on actual worktreee
files.

Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
2020-12-14 17:42:02 -04:00
Joey Hess
4a8723246d
avoid transferrer committing the git-annex branch on shutdown
The parent is will do it when it shuts down, and having both of them
trying to do it at the same time seems like something good to avoid.
2020-12-11 16:16:07 -04:00
Joey Hess
d3f78da0ed
propagate signals to the transferrer process group
Done on unix, could not implement it on windows quite.

The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.

Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-11 15:32:00 -04:00
Joey Hess
a422a056f2
make getViaTmpFrom no longer update location log
All callers adjusted to update it themselves.

In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.

This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
2020-12-11 11:50:13 -04:00
Joey Hess
cedad7b37d
refactor 2020-12-10 16:33:52 -04:00
Joey Hess
04c12aa6df
custom protocol for transferrer
Rather than using Read/Show, which would force me to preserve data types
into the future.

I considered just deriving json and sending that, but I don't much like
deriving json with data types that have named constructors (like Key
does) because again it locks in data type details.

So instead, used SimpleProtocol, with a fairly complex and unreadable
protocol. But it is as efficient as the p2p protocol at least, and as
future proof.

(Writing my own custom json instances would have worked but I thought
of it too late and don't want to do all the work twice. The only real
benefit might be that aeson could be faster.)

Note that, when a new protocol request type is added later, git-annex
trying to use it will cause the git-annex transferrer to display a
protocol error message. That seems ok; it would only happen if a new
git-annex found an old version of itself in PATH or the program
file. So it's unlikely, and all it can do anyway is display an error.
(The error message could perhaps be improved..)

This commit was sponsored by Jack Hill on Patreon.
2020-12-09 16:13:59 -04:00
Joey Hess
004a4f5fb1
factor out Types.Transferrer 2020-12-09 13:28:49 -04:00
Joey Hess
677003a6df
rename helper
More consistent name with TransferrerPool
2020-12-09 13:24:24 -04:00
Joey Hess
05c0543e8e
move new interface to git-annex transfer
This is to avoid breakage when upgrading or downgrading git-annex with a
process running that uses the interface. It's better to keep the
compatability code for a few years than worry about such breakage.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-12-09 12:33:56 -04:00
Joey Hess
fcc9e01556
finally using transferkeys
Seems to work! Even progress bars. Have not tested prompting or various
error message displays yet.

transferkeys had to be made to operate in different modes for the
Assistant and Annex monads. A bit ugly, but it did relegate that
really ugly Database.Keys.closeDb in transferkeys to only the assistant
code path.

This commit was sponsored by Noam Kremen.
2020-12-07 16:18:26 -04:00
Joey Hess
4c47568876
refactoring
This is groundwork for using git-annex transferkeys to run transfers,
in order to allow stalled transfers to be interrupted and retried.

The new upload and download are closer to what git-annex transferkeys
does, so the plan is to make them use it.

Then things that were left using upload' and download' won't recover
from stalls. Notably, that includes import and export. But
at least get/move/copy will be able to. (Also the assistant hopefully,
but not yet.)

This commit was sponsored by Jake Vosloo on Patreon.
2020-12-07 14:49:17 -04:00
Joey Hess
438d5be1f7
support prompt in message serialization
That seems to be the last thing needed for message serialization.
Although it's only used in the assistant currently, so hard to tell if I
forgot something.

At this point, it should be possible to start using transferkeys
when performing transfers, which will allow killing a transferkeys
process if a transfer times out or stalls. But that's for another day.

This commit was sponsored by Ethan Aubin.
2020-12-04 14:54:09 -04:00
Joey Hess
7a9b618d5d
fix problem with last commit and assistant
liftAnnex blocks all others calls, so avoid using it with a long-duration
call to readResponse.
2020-12-04 12:20:04 -04:00
Joey Hess
cad147cbbf
new protocol for transferkeys, with message serialization
Necessarily threw out the old protocol, so if an old git-annex assistant
is running, and starts a transferkeys from the new git-annex, it would
fail. But, that seems unlikely; the assistant starts up transferkeys
processes and then keeps them running. Still, may need to test that
scenario.

The new protocol is simple read/show and looks like this:

TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo"))
TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9}))
TransferOutput (OutputMessage "(checksum...) ")
TransferResult True

Granted, this is not optimally fast, but it seems good enough, and is
probably nearly as fast as the old protocol anyhow.

emitSerializedOutput for ProgressMeter is not yet implemented. It needs
to somehow start or update a progress meter. There may need to be a new
message that allocates a progress meter, and then have ProgressMeter
update it.

This commit was sponsored by Ethan Aubin
2020-12-03 16:21:20 -04:00
Joey Hess
a3b714ddd9
finish fixing removeLink on windows
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink

This commit was sponsored by Ethan Aubin.
2020-11-24 13:20:44 -04:00
Joey Hess
631c8d3e5b
avoid redundant adjusted branch update in sync
sync still does update it if the config would otherwise not, since it
already did.
2020-11-16 15:13:48 -04:00
Joey Hess
0896038ba7
annex.adjustedbranchrefresh
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.

Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:

* If two files point to one file, then eg, `git annex get foo` will
  update the branch to unlock foo, but will not unlock bar, because it
  does not know about it. Might be fixable by making `git annex get
  bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
  of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info

Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
2020-11-16 14:27:28 -04:00
Joey Hess
26cf26caca
Merge branch 'master' into symlink-missing 2020-11-16 10:03:12 -04:00
Joey Hess
5a8d01f63e
examinekey: Added a "file" format variable
For consistency with find, and for easier scripting.
2020-11-16 09:59:11 -04:00
Joey Hess
ccfa9b2dc4
make sync update --unlock-present branch 2020-11-13 15:04:34 -04:00
Joey Hess
e66b7d2e1b
rename to --unlock-present and better reverse adjusting
An --unlock-present branch reverses back to a branch where
all files that get modified or renamed become locked, even if they were
originally unlocked. This is the same that reversing a --unlock branch
works, and the new name makes that commonality more clear.
2020-11-13 14:56:43 -04:00
Joey Hess
3899e216af
Merge branch 'master' into symlink-missing 2020-11-13 14:19:45 -04:00
Joey Hess
a30030c4a6
move: Fix a regression in the last release that made move --to not honor numcopies settings
This commit was sponsored by Svenne Krap on Patreon.
2020-11-13 14:19:32 -04:00
Joey Hess
c8e49c5ef5
git-annex adjust --lock-missing
Like --hide-missing the branch does not get updated when content
availability changes.

Seems to basically work, but sync does not update it yet.

Also, when a file is present and so unlocked, git mv followed by
git-annex sync results in the basis branch being updated to contain the
file with the new name, unlocked. This seems different than what
happens in an adjusted unlocked branch, where the commit propigates back
locked. Probably the reverse adjustment code needs to be improved to
handle this case.
2020-11-13 13:39:44 -04:00
Joey Hess
7566aa6bc5
examinekey: Added --migrate-to-backend
Note that, the way the SeekInput parser is written to support batch mode,
it's actually possible to do git-annex examinekey
"SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E

While that might be kind of useful to support multiple migrations not using
batch mode, I have not documented it. It would be better to take pairs of
key and file in that case.
2020-11-12 14:09:14 -04:00
Joey Hess
12e32d1dee
examinekey: Added two new format variables: objectpath and objectpointer 2020-11-12 13:02:31 -04:00
Joey Hess
92b7b1964d
add warning on add of annex link
Warn when adding a annex symlink or pointer file that uses a key that is
not known to the repository, to prevent confusion if the user has copied it
from some other repository.

This commit was sponsored by Jake Vosloo on Patreon.
2020-11-10 12:10:51 -04:00
Joey Hess
e81bb05b25
add debug in two unusual situations 2020-11-09 17:52:06 -04:00
Joey Hess
1db49497e0
finished this stage of the RawFilePath conversion
This commit was sponsored by Denis Dzyubenko on Patreon.
2020-11-06 14:10:58 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
5a1e73617d
finished this stage of the RawFilePath conversion
Finally compiles again, and test suite passes.

This commit was sponsored by Brock Spratlen on Patreon.
2020-11-04 14:20:37 -04:00
Joey Hess
4bcb4030a5
more RawFilePath conversion
580/645

This commit was sponsored by Jack Hill on Patreon.
2020-11-03 18:34:27 -04:00
Joey Hess
eb42cd4d46
more RawFilePath conversion
535/645

This commit was sponsored by Brett Eisenberg on Patreon.
2020-11-03 10:11:04 -04:00