Commit graph

1807 commits

Author SHA1 Message Date
Joey Hess
7294d23d78
export: Added --from option
This is similar to git-annex copy --from --to, in that it downloads a
local copy, locks it for removal, uploads it, and drops it. Removal of
the temporary local copy is done without verifying numcopies for the
same reason as that command.

I do wonder, looking at this, if there's a race where the local copy
gets used as a copy to allow some other drop in the narrow window after
it is downloaded and before it gets locked for removal. That would need
some other repository to have an out of date location log that says the
repository contains a copy of the key, in order for it to try to use it
as a copy. If there is such a race, git-annex copy/move would also be
vulnerable to it. It would be better to lock it for removal before
starting to download it! That is possible in v10 repositories, which do
use a separate content lock file.

Note that, when the exported tree contains several files that use the
same key, it will be downloaded repeatedly, once per time needed to
upload it. It would be possible to avoid that extra work, but it would
complicate this since the local copy would need to be preserved, locked
for removal, until the end. Also, that would mean that interrupting the
export would leave possibly a lot of temporarily downloaded keys in the
local repository, while currently it can only leave one.
2024-08-08 12:08:55 -04:00
Joey Hess
6d96734128
updateproxy, updatecluster check annexobjects=yes
updateproxy, updatecluster: Prevent using an exporttree=yes special remote
that does not have annexobjects=yes, since it will not work.
2024-08-07 12:27:24 -04:00
Joey Hess
8864a9e353
update 2024-08-07 11:49:53 -04:00
Joey Hess
b8f8c38e88
Merge branch 'master' into exportreeplus 2024-08-07 11:28:21 -04:00
Joey Hess
509b23fa00
catch ClientError from withClientM
When getting from a P2P HTTP remote, prompt for credentials when required,
instead of failing.

This feels like it might be a bug in servant-client. withClientM's type
suggests it would not throw a ClientError. But it does in this case.
2024-08-07 11:24:34 -04:00
Joey Hess
c53f61e93f
Merge branch 'master' into exportreeplus 2024-08-06 14:46:33 -04:00
Joey Hess
3cc03b4c96
fix file corruption when proxying an upload to a special remote
The file corruption consists of each chunk of the file being duplicated.
Since chunks are typically a fixed size, it would certianly be possible
to get from a corrupted file back to the original file. But this is still
bad data loss.

Reversion was in commit fcc052bed8.
Luckily that did not make the most recent release.
2024-08-06 14:41:19 -04:00
Joey Hess
3289b1ad02
proxying to exporttree=yes annexobjects=yes basically working
It works when using git-annex sync/push/assist, or when manually sending
all content to the proxied remote before pushing to the proxy remote.
But when the push comes before the content is sent, sending content does
not update the exported tree.
2024-08-06 14:21:23 -04:00
Joey Hess
9da2860812
Merge branch 'master' into exportreeplus 2024-08-02 18:45:44 -04:00
Joey Hess
c34d1da22a
Remove debug output (to stderr)
Accidentially included in last version. Only happens when running code that
uses remoteUrl.
2024-08-02 14:13:29 -04:00
Joey Hess
28b29f63dc
initial support for annexobjects=yes
Works but some commands may need changes to support special remotes
configured this way.
2024-08-02 14:07:45 -04:00
Joey Hess
3a1f39fbdf
Avoid loading cluster log at startup
This fixes a problem with datalad's test suite, where loading the cluster
log happened to cause the git-annex branch commits to take a different
shape, with an additional commit.

It's also faster though, since many commands don't need the cluster log.

Just fill Annex.clusters with a thunk.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-07-31 15:54:14 -04:00
Joey Hess
c1bc0bffc8
releasing package git-annex version 10.20240731 2024-07-31 14:05:01 -04:00
Joey Hess
d1b641cb1e
update stack.yaml to nightly-2024-07-29 and remove stack-lts-18.13.yaml
Primarily because Windows needs a dependency bump to get stm-2.5.1
for Servant build flag.

This includes Win32-2.13.4.0 and aws-0.24 which adds some features
that windows had been missing out on as well.

Lots of warnings about head and tail will need to eventually be
addressed. Of course AFAIK the uses of it in git-annex are all safe.
2024-07-29 20:09:37 -04:00
Joey Hess
fcc052bed8
When proxying an upload to a special remote, verify the hash.
While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
2024-07-29 13:40:51 -04:00
Joey Hess
074fad819d
changelog 2024-07-29 13:09:19 -04:00
Joey Hess
60b1c53df5
preparing to merge 2024-07-29 11:22:27 -04:00
Joey Hess
0dc064a9ad
When proxying for a special remote, avoid unncessary hashing
Like the comment says, the client will do its own verification. But it was
calling verifyKeyContentPostRetrieval, which was hashing the file.
2024-07-29 11:18:03 -04:00
Joey Hess
ccbdaf0448
documentation for p2phttp 2024-07-28 17:19:27 -04:00
Joey Hess
b4d749cc91
Merge branch 'master' into httpproto 2024-07-23 21:17:06 -04:00
Joey Hess
f7404a64c0
Propagate --force to git-annex transferrer
And other child processes.
2024-07-23 21:16:56 -04:00
Joey Hess
6596fe80ec
update 2024-07-23 08:53:52 -04:00
Joey Hess
86ce3bf1e4
started servant implementation of HTTP P2P protocol 2024-07-07 12:08:10 -04:00
Joey Hess
543c610a31
REMOVE-BEFORE and GETTIMESTAMP
Only implemented server side, not used client side yet.

And not yet implemented for proxies/clusters, for which there's a build
warning about unhandled cases.

This is P2P protocol version 3. Probably will be the only change in that
version..

Added a dependency on clock to access a monotonic clock.
On i386-ancient, that is at version 0.2.0.0.
2024-07-03 17:01:58 -04:00
Joey Hess
12a0ca9656
assistant: Fix a race condition that could cause a pointer file to get ingested into the annex
This was caused by commit fb8ab2469d putting
an isPointerFile check in the wrong place. So if the file was not a pointer
file at that point, but got replaced by one before the file got locked
down, the pointer file would be ingested into the annex.

The fix is simply to move the isPointerFile check to after safeToAdd locks
down the file. Now if the file changes to a pointer file after the
isPointerFile check, ingestion will see that it changed after lockdown,
and will refuse to add it to the annex.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-07-02 12:25:30 -04:00
Joey Hess
20ebb54b6f
prep release 2024-07-01 15:13:10 -04:00
Joey Hess
0033e6c0a6
Tab completion of many commands like info and trust now includes remotes
Especially useful with proxied remotes and clusters, where the user may not
be entirely familiar with the name and can learn by tab completion.
2024-06-30 12:39:18 -04:00
Joey Hess
dbfff04fb6
update for clusters 2024-06-27 12:47:26 -04:00
Joey Hess
0ef4183b00
Merge branch 'master' into proxy 2024-06-27 12:41:57 -04:00
Joey Hess
19137ae780
avoid unfiltered debugging from git-annex-shell
When --debugfilter or annex.debugfilter is set, avoid propigating debug
output from git-annex-shell, since it cannot be filtered.

It would be possible to pass --debugfilter on to git-annex-shell,
but it only started accepting that option in 2022. So it would break
interop with older versions.
2024-06-27 12:37:25 -04:00
Joey Hess
f18740699e
P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS
Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS
is complete, when a PUT stores to additional repositories
than the expected on, the location log is updated with the
additional UUIDs that contain the content.

Started implementing PUT fanout to multiple remotes for clusters.
It is untested, and I fear fencepost errors in the relative
offset calculations. And it is missing proxying for the protocol
after DATA.
2024-06-18 16:21:40 -04:00
Joey Hess
3970bbb03b
Merge branch 'master' into proxy 2024-06-17 09:29:34 -04:00
Joey Hess
af79728ac3
tab complete special remotes
An oversight..

And with the work in progress proxy and cluster, there
can be additional remotes that are not listed in .git/config, but are
available. Making those more discoverable is another big benefit of
this.
2024-06-17 09:26:03 -04:00
Joey Hess
570ceffe8d
broke out initcluster
One benefit of this is that a typo in annex-cluster-node config won't
init a new cluster.

Also it gets the cluster description set and is consistent with
initremote.
2024-06-14 17:23:11 -04:00
Joey Hess
bbf261487d
add git-annex updatecluster command
Seems to work fine, making the right changes to the git-annex branch.
2024-06-14 15:02:01 -04:00
Joey Hess
2844230dfe
add git configs for clusters 2024-06-14 12:20:17 -04:00
Joey Hess
649b87bedd
Merge branch 'master' into proxy 2024-06-10 14:26:18 -04:00
Joey Hess
b32c4c2e98
atomic git-annex branch update when regrafting in transition
Fix a bug where interrupting git-annex while it is updating the git-annex
branch could lead to git fsck complaining about missing tree objects.

Interrupting git-annex while regraftexports is running in a transition
that is forgetting git-annex branch history would leave the
repository with a git-annex branch that did not contain the tree shas
listed in export.log. That lets those trees be garbage collected.

A subsequent run of the same transition then regrafts the trees listed
in export.log into the git-annex branch. But those trees have been lost.

Note that both sides of `if neednewlocalbranch` are atomic now. I had
thought only the True side needed to be, but I do think there may be
cases where the False side needs to be as well.

Sponsored-by: Dartmouth College's OpenNeuro project
2024-06-07 16:34:10 -04:00
Joey Hess
f97f4b8bdb
Added updateproxy command and remote.name.annex-proxy configuration
So far this only records proxy information on the git-annex branch.
2024-06-04 14:52:03 -04:00
Joey Hess
da2c02162c
Fix Windows build with Win32 2.13.4+
Thanks, Oleg Tolmatcev
2024-06-03 13:04:15 -04:00
Joey Hess
abbb8f6bbf
releasing package git-annex version 10.20240531 2024-05-31 12:32:34 -04:00
Joey Hess
aeedca70ca
prep release 2024-05-30 17:53:33 -04:00
Joey Hess
98762a2f96
group: Added --list option
Seemed to make sense to exclude groups used only by dead repositories.
2024-05-29 13:37:35 -04:00
Joey Hess
3318d25c65
adjust unlocked execute bit handling
When building an adjusted unlocked branch, make pointer files executable
when the annex object file is executable.

This slows down git-annex adjust --unlock/--unlock-present by needing to
stat all annex object files in the tree. Probably not a significant
slowdown compared to other work they do, but I have not benchmarked.

I chose to leave git-annex adjust --unlock marked as stable, even though
get or drop of an object file can change whether it would make the pointer
file executable. Partly because making it unstable would slow down
re-adjustment, and partly for symmetry with the handling of an unlocked
pointer file that is executable when the content is dropped, which does not
remove its execute bit.
2024-05-28 12:39:42 -04:00
Joey Hess
22bf23782f
initremote, enableremote: Added --with-url to enable using git-remote-annex
Also sets remote.name.fetch to a typical value, same as git remote add does.
2024-05-24 14:29:36 -04:00
Joey Hess
434a88c368
Merge branch 'git-remote-annex' 2024-05-15 17:57:50 -04:00
Joey Hess
768cdee461
testremote: Really fsck downloaded objects
8844372c23 exposted a bug in testremote, it
was passing the serialized key, not the object file, to be checksummed.
2024-05-15 17:57:27 -04:00
Joey Hess
468de43d66
Merge branch 'master' into git-remote-annex 2024-05-15 17:49:12 -04:00
Joey Hess
0281f7f23e
Avoid the --fast option preventing checksumming in some cases it was not supposed to
fsck --fast was intended to disable checksumming, but checksumming is done
after transfers too. Due to the check being in the non-incremental path,
it would only affect non-incremental checksumming during a transfer,
and I'm not 100% sure that it was a problem.

Also, when using an external backend that does checksumming, fsck --fast
didn't disable it and now does.
2024-05-12 21:36:48 -04:00
Joey Hess
05684bdd6c
fsck: Fix recent reversion that made it say it was checksumming files whose content is not present
Did not track down the commit that caused the problem, but git-annex
version 10.20240431 didn't behave that way.
2024-05-12 21:23:27 -04:00
Joey Hess
dfb09ad1ad
preparing to merge git-remote-annex
Update its todo with remaining items.

Add changelog entry.

Simplified internals document to no longer be notes to myself, but
target users who want to understand how the data is stored
and might want to extract these repos manually.

Sponsored-by: Kevin Mueller on Patreon
2024-05-10 15:06:15 -04:00
Joey Hess
9dea552f9b
changelog for typo fixes
Since a few affected output messages.
2024-05-01 15:47:28 -04:00
Yaroslav Halchenko
9c2ab31549
Fix compatable typo (yet to add to codespell)
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "git-sedi compatable compatible",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
2024-05-01 15:46:25 -04:00
Joey Hess
d6ad5b9b50
releasing package git-annex version 10.20240430 2024-04-30 15:27:31 -04:00
Joey Hess
f3cca8a9f8
applied patch 2024-04-30 15:17:38 -04:00
Joey Hess
c410b2bb73
annex.maxextensions configuration
Controls how many filename extensions to preserve.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-04-18 14:23:38 -04:00
Joey Hess
d372553540
rclone special remote
Added rclone special remote, which can be used without needing to install
the git-annex-remote-rclone program. This needs a new version of rclone,
which supports "rclone gitannex".

This is implemented as a variant of an external special remote, that
runs "rclone gitannex" instead of the usual git-annex-remote- command.
Parameterized Remote.External to support that.

Sponsored-by: Luke T. Shumaker on Patreon
2024-04-17 15:20:37 -04:00
Joey Hess
2c73845d90
multiple -m second try
Test suite passes this time. When committing the adjusted branch, use
the old method to make a message that old git-annex can consume. Also
made the code accept the new message, so that eventually
commitTreeExactMessage can be removed.

Sponsored-by: Kevin Mueller on Patreon
2024-04-09 12:56:47 -04:00
Joey Hess
a8dd85ea5a
Revert "multiple -m"
This reverts commit cee12f6a2f.

This commit broke git-annex init run in a repo that was cloned from a
repo with an adjusted branch checked out.

The problem is that findAdjustingCommit was not able to identify the
commit that created the adjusted branch. It seems that there is an extra
"\n" at the end of the commit message that it does not expect.

Since backwards compatability needs to be maintained, cannot just make
findAdjustingCommit accept it with the "\n". Will have to instead
have one commitTree variant that uses the old method, and use it for
adjusted branch committing.
2024-04-02 17:29:07 -04:00
Joey Hess
cee12f6a2f
multiple -m
sync, assist, import: Allow -m option to be specified multiple times, to
provide additional paragraphs for the commit message.

The option parser didn't allow multiple -m before, so there is no risk of
behavior change breaking something that was for some reason using multiple
-m already.

Pass through to git commands, so that the method used to assemble the
paragrahs is whatever git does. Which might conceivably change in the
future.

Note that git commit-tree has supported -m since git 1.7.7. commitTree
was probably not using it since it predates that version. Since the
configure script prevents building git-annex with git older than 2.1,
there is no risk that it's not supported now.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-03-27 15:58:27 -04:00
Joey Hess
377e9fff18
fix typo 2024-03-27 12:45:40 -04:00
Joey Hess
7c5007279c
Windows: Fix escaping output to terminal when using old versions of MinTTY 2024-03-26 13:09:21 -04:00
Joey Hess
f04d9574d6
fix transfer lock file for Download to not include uuid
While redundant concurrent transfers were already prevented in most
cases, it failed to prevent the case where two different repositories were
sending the same content to the same repository. By removing the uuid
from the transfer lock file for Download transfers, one repository
sending content will block the other one from also sending the same
content.

In order to interoperate with old git-annex, the old lock file is still
locked, as well as locking the new one. That added a lot of extra code
and work, and the plan is to eventually stop locking the old lock file,
at some point in time when an old git-annex process is unlikely to be
running at the same time.

Note that in the case of 2 repositories both doing eg
`git-annex copy foo --to origin`
the output is not that great:

copy b (to origin...)
  transfer already in progress, or unable to take transfer lock
git-annex: transfer already in progress, or unable to take transfer lock
97%   966.81 MiB      534 GiB/s 0sp2pstdio: 1 failed

  Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe))

  Transfer failed

Perhaps that output could be cleaned up? Anyway, it's a lot better than letting
the redundant transfer happen and then failing with an obscure error about
a temp file, which is what it did before. And it seems users don't often
try to do this, since nobody ever reported this bug to me before.
(The "97%" there is actually how far along the *other* transfer is.)

Sponsored-by: Joshua Antonishen on Patreon
2024-03-25 14:47:46 -04:00
Joey Hess
dee249ac51
fix name of option 2024-03-22 10:54:14 -04:00
Joey Hess
016d1bee88
add reregisterurl command
What this can currently be used for is only to change an url from being
used by a special remote to being used by the web remote.

This could have been a --move-from option to registerurl. But, that would
have complicated its option and --batch processing, and also would have
complicated unregisterurl, which is implemented on top of
Command.Registerurl. So, a separate command was actually less complicated
to implement.

The generic description of the command is because I want to make this
command a catch-all for other url updating kind of things, if there are
ever any more. Also because it was hard to come up with a good name for the
specific action. I considered `git-annex moveurl`, but that seems to
indicate data is perhaps actually being moved, and seems to sit at the same
level as addurl and rmurl, and this command is at the plumbing
level of registerurl and unregisterurl.

Sponsored-by: Dartmouth College's DANDI project
2024-03-05 15:06:14 -04:00
Joey Hess
def94fbff6
update 2024-03-01 13:48:51 -04:00
Joey Hess
0f7143d226
support VURL backend
Not yet implemented is recording hashes on download from web and
verifying hashes.

addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.

It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.

Sponsored-by: Joshua Antonishen on Patreon
2024-02-29 13:48:51 -04:00
Joey Hess
c2d6c02c27
Added dependency on unbounded-delays
And stop vendoring part of it.

This is a free dependency because tasty depends on it.

Sponsored-by: Leon Schuermann on Patreon
2024-02-27 13:11:59 -04:00
Joey Hess
bee3abab14
releasing package git-annex version 10.20240227 2024-02-27 13:02:17 -04:00
Joey Hess
70cb41028e
Pass --no-warnings to yt-dlp
Notice a warning with -J2 causing git-annex progress output to get slightly
messed up.

Error output would also probably do that, so perhaps it should capture
stderr and only display it when yt-dlp exited nonzero?

This option might also make sense for youtube-dl, I don't have an
installation handy anymore to check.
2024-02-19 18:35:57 -04:00
Joey Hess
3475b09c3e
pre-commit: Avoid committing the git-annex branch
Except when a commit is made in a view, which changes metadata.

Make the assistant commit the git-annex branch after git commit of working
tree changes.

This allows using the annex.commitmessage-command in the assistant to
generate a commit message for the git-annex branch that relies on state
gathered during the commit of the working tree. Eg, it might reuse the
commit message.

Note that, when not using the assistant, a git-annex add still commits
the git-annex branch, so such a annex.commitmessage-command set up would
not work then. But if someone is using the assistant and wants
programmatic control over commit messages, this is useful. Someone not
using the assistant can get the same result by using annex.alwayscommit=false
during the git-annex add, and git-annex merge after they git commit.

pre-commit was never really intended to commit the git-annex branch
(except after recording changed metadata), but the assistant did sort of
rely on it. It does later commit the git-annex branch before pushing to
remotes, but I didn't want to risk building up lots of uncommitted changes
to it if that didn't happen frequently.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-02-12 14:42:11 -04:00
Joey Hess
68e99513f0
added annex.commitmessage-command config
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-02-12 14:35:22 -04:00
Joey Hess
21123ba368
assistant, undo: When committing, let the usual git commit hooks run
Was doing a Git.Branch.commit for historical reasons to do with direct
mode, which no longer apply.

Note that the preCommitAnnexHook is no longer called in commitStaged
because git-annex installs a pre-commit hook that runs the pre-commit-annex
hook. And git commit will run the pre-commit hook.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-02-07 16:15:35 -04:00
Joey Hess
4f8fcf707d
stack.yaml: Update to lts-22.9 and use crypton. 2024-02-06 11:08:12 -04:00
Joey Hess
6b38d0c427
addurl, importfeed: Added --raw-except option
--raw-except=web allows using yt-dlp but not any other special remotes.

Currently this option can only be used once, trying to use it repeatedly
will make option parsing fail. Perhaps it ought to support being used more
than once, but it seemed like an unlikely use case to need that.

Note that getParsed is called repeatedly when the option is used with
several urls. While implementing DeferredParseClass would avoid that
innefficiency, it didn't seem worth the added boilerplate since
getParsed only calls byNameWithUUID which does minimal work.

Sponsored-by: Dartmouth College's DANDI project
2024-02-05 15:16:25 -04:00
Joey Hess
2f3fe4d904
fix importfeed --force skip behavior reversion
importfeed --force: Don't treat it as a failure when an already downloaded
file exists. (Fixes a behavior change introduced in 10.20230626.)

04ee6c4c6b caused the reversion. Inside a CommandPerform, stop causes it
to fail. Before that commit, it was inside a CommandStart, where stop
causes it to skip.
2024-02-02 15:57:07 -04:00
Joey Hess
0c64cd30c2
compare urls irrespective of downloader
importfeed --force: Avoid creating duplicates of existing already
downloaded files when yt-dlp or a special remote was used.
2024-02-02 15:50:56 -04:00
Joey Hess
90db97d9a2
importfeed: Added --scrape option
Which uses yt-dlp to screen scrape the equivilant of an RSS feed.

Note that youtubedlscraped is a speed optimisation. Since yt-dlp found
the urls, we know it can download them. That avoids calling
youtubeDlSupported on each url, which makes --fast a lot faster.

Almost all the same metadata fields and file formatting fields are
populated, when yt-dlp is able to get the data. Note that yt-dlp has some
additional useful metadata that could be exposed. But, much of it is
specific to particular websites, and it would be hard to document on the
git-annex importfeed man page.

Sponsored-by: unqueued on Patreon
2024-01-30 15:37:29 -04:00
Joey Hess
d61633e183
releasing package git-annex version 10.20240129 2024-01-29 14:12:12 -04:00
Joey Hess
0b8ba37d12
improve changelog 2024-01-25 14:28:19 -04:00
Joey Hess
8e9ee31621
webapp: Added --port option, and annex.port config
The getSocket comment that mentioned using ":port"
in the hostname seems to have been incorrect or be out of date.
After all, the bug report came when the user first tried doing that,
and it didn't work.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-01-25 14:08:36 -04:00
Joey Hess
b9e147d282
Added --expected-present file matching option 2024-01-25 12:56:41 -04:00
Joey Hess
20567e605a
add directional stalldetection and bwlimit configs
Sponsored-by: Dartmouth College's DANDI project
2024-01-19 15:27:53 -04:00
Joey Hess
c02df79248
use watchFileSize in Remote.External.retrieveKeyFile
external: Monitor file size when getting content from external special
remotes and use that to update the progress meter, in case the external
special remote program does not report progress.

This relies on 703a70cafa to prevent ever
running the meter backwards.

Sponsored-by: Dartmouth College's DANDI project
2024-01-19 14:34:30 -04:00
Joey Hess
c2634e7df2
automatically adjust stall detection period
Improve annex.stalldetection to handle remotes that update progress less
frequently than the configured time period.

In particular, this makes remotes that don't report progress but are
chunked work when transferring a single chunk takes longer than the
specified time period.

Any remotes that just have very low update granulatity would also be
handled by this.

The change to Remote.Helper.Chunked avoids an extra progress update when
resuming an interrupted upload. In that case, the code saw first Nothing
and then Just the already transferred number of bytes, which defeated this
new heuristic. This change will mean that, when resuming an interrupted
upload to a chunked remote that does not do its own progress reporting, the
progress display does not start out displaying the amount sent so far,
until after the first chunk is sent. This behavior change does not seem
like a major problem.

About the scalefudgefactor, it seems reasonable to expect subsequent chunks
to take no more than 1.5 times as long as the first chunk to transfer.
Could set it to 1, but then any chunk taking a little longer would be
treated as a stall. 2 also seems a likely value. Even 10 might be fine?

Sponsored-by: Dartmouth College's DANDI project
2024-01-18 17:12:10 -04:00
Joey Hess
e765d3e24c
import: --message/-m option 2024-01-18 12:41:44 -04:00
Joey Hess
f6cf2dec4c
disk free checking for unsized keys
Improve disk free space checking when transferring unsized keys to
local git remotes. Since the size of the object file is known, can
check that instead.

Getting unsized keys from local git remotes does not check the actual
object size. It would be harder to handle that direction because the size
check is run locally, before anything involving the remote is done. So it
doesn't know the size of the file on the remote.

Also, transferring unsized keys to other remotes, including ssh remotes and
p2p remotes don't do disk size checking for unsized keys. This would need a
change in protocol.

(It does seem like it would be possible to implement the same thing for
directory special remotes though.)

In some sense, it might be better to not ever do disk free checking for
unsized keys, than to do it only sometimes. A user might notice this
direction working and consider it a bug that the other direction does not.
On the other hand, disk reserve checking is not implemented for most
special remotes at all, and yet it is implemented for a few, which is also
inconsistent, but best effort. And so doing this best effort seems to make
some sense. Fundamentally, if the user wants the size to always be checked,
they should not use unsized keys.

Sponsored-by: Brock Spratlen on Patreon
2024-01-16 14:29:10 -04:00
Joey Hess
7e69063a29
support annex.shared-sop-command for encryption=shared
This works well, and it interoperates with gpg in my testing (although some
SOP commands might choose to use a profile that does not so caveat emptor).

Note that for creating the Cipher, gpg --gen-random is still used. SOP
does not have an eqivilant, and as long as the user has gpg around,
which seems likely, it doesn't matter that it uses gpg here, it's not being
used for encryption. That seemed better than implementing a second way
to get high quality entropy, at least for now.

The need for the sop command to run in an empty directory has each call
to encrypt and decrypt creating a new temporary directory. That is some
unncessary overhead, though probably swamped by the overhead of running
the sop command. This could be improved in the future by passing an
already empty directory to them, or a sufficiently empty directory
(.git/annex/tmp would probably suffice).

Sponsored-by: Brett Eisenberg on Patreon
2024-01-12 13:31:18 -04:00
Joey Hess
d98f02a5b0
test annex.shared-sop-command
Test a specified Stateless OpenPGP command with eg:
git-annex test --test-git-config annex.shared-sop-command=sqop

Also documented that config and another one, but so far only the test suite
uses the configs, have not yet implemented using it for actual symmetric
encryption.

Sponsored-by: Joshua Antonishen on Patreon
2024-01-10 16:30:38 -04:00
Joey Hess
de6a297d36
assistant: When generating a gpg secret key, avoid hardcoding the key algorithm and size
This aims to future-proof gpg key generation. OpenPGP is in flux with a
conflict over standards ongoing. It seems not unlikely that different
systems will have different gpg commands that support different algorithms.

This also simplifies the code by using the --quick-gen-key interface rather
than the experimental batch interface. It seems less likely that
--quick-gen-key will break than an experimental interface (whose
documentation I can no longer find).

--quick-gen-key is supported since gpg 2.1.0 (2014).

Sponsored-by: Graham Spencer on Patreon
2024-01-09 15:31:53 -04:00
Joey Hess
2c86651180
optimise adjustTree when adding many TreeItems
The old code traversed the list of addtreeitems once per subdirectory in
the tree, so could get quite slow. Converting to Map lookups sped it up
significantly.

In my test case, git-annex import used to take about 2 minutes, when
calling adjustTree to add back excluded files to the imported tree. This
dropped it down to 6 seconds. Of which 4 seconds are the actual
enumeration of the contents of the remote, so really only 2 seconds for
this.

The path prefix map is a bit suboptimal memory-wise, since items get
stored in the map once per subdirectory on the path to the item. It
would perhaps be better to use a tree data structure.

Also it's suboptimal memory-wise that it builds two maps, as well
as retaining a reference to addtreeitems. I could not see a way around
that though.

Sponsored-by: Luke T. Shumaker on Patreon
2024-01-03 15:07:49 -04:00
Joey Hess
a5b9c2ca69
import: Sped up import from special remote when the imported tree is unchanged
I saw a nearly 2 minute speed up from this, in a repo with 56000 files some
of which are preferred content of the special remote and others not. In
such a case, addBackExportExcluded has to do a lot of work, which is
unncessary when the tree is unchanged.

When using sync --content, preferred content checking of that many files
takes about 1 minute. So this speeds up sync --content by 3x.
When using git-annex import, the speed up is much larger.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-01-02 13:57:31 -04:00
Joey Hess
a4a5ec6366
info: Added "annex sizes of repositories" table to the overall display
Thanks to previous work in 11cc9f1933,
this is almost entirely free, it only needs to do some additional map
lookups and math.

The strictness annotations keep the memory use from blowing up.

Sponsored-by: unqueued on Patreon
2023-12-29 12:09:30 -04:00
Joey Hess
f3fa9dc65f
releasing package git-annex version 10.20231227 2023-12-27 19:27:55 -04:00
Joey Hess
8a3beabf35
use RawFilePath for opening sqlite databases
Fix a crash opening sqlite databases when run in a non-unicode locale,
with a remote that uses a non-unicode filepath. In that situation
converting to Text fails.

The fix needs git-annex to be built with persistent-sqlite 2.13.3.
Building against older versions still works, but that version is used when
building with stack.

Database.RawFilePath is a lot of code copied from persistent-sqlite and
lightly modified, since only 1 function in persistent-sqlite was made to
support RawFilePath. This is a bit of a pain, and I hope that
persistent-sqlite will eventually switch to using OsPath, allowing this
module to be removed from git-annex.

Sponsored-by: k0ld on Patreon
2023-12-26 18:31:52 -04:00
Joey Hess
6d789c9c81
sync, push: Avoid trying to send individual files to special remotes configured with importtree=yes exporttree=no
That will always fail. It already skipped doing this when exporttree=yes.
2023-12-26 15:56:58 -04:00
Joey Hess
aec7bed1aa
prepping for release 2023-12-26 15:40:55 -04:00
Joey Hess
9a67ed0f10
importtree: support preferred content expressions needing keys
When importing from a special remote, support preferred content expressions
that use terms that match on keys (eg "present", "copies=1"). Such terms
are ignored when importing, since the key is not known yet.

When "standard" or "groupwanted" is used, the terms in those
expressions also get pruned accordingly.

This does allow setting preferred content to "not (copies=1)" to make a
special remote into a "source" type of repository. Importing from it will
import all files. Then exporting to it will drop all files from it.

In the case of setting preferred content to "present", it's pruned on
import, so everything gets imported from it. Then on export, it's applied,
and everything in it is left on it, and no new content is exported to it.

Since the old behavior on these preferred content expressions was for
importtree to error out, there's no backwards compatability to worry about.
Except that sync/pull/etc will now import where before it errored out.
2023-12-18 16:27:59 -04:00
Joey Hess
eb59da9dd2
Lower precision of timestamps in git-annex branch
This can reduce the size of the branch by up to 8%. My test was
running git-annex add 1000 times on one file each.
Lots of different high-resolution timestamps were recorded before
and eliminating those, after packing, the git repo was 8% smaller.

Due to the use of vector clocks, high resolution timestamps are
not necessary to make clear which information is most recent when
eg, a value is changed repeatedly in the same second. In such a
case, the vector clock will be advanced to the next second after
the last modification. For example, running
git-annex numcopies 1; git-annex numcopies 2
The first will record the current second, while the next records
the second after that even if it runs in the same second.

As for conflicting information written to two different clones of the
repository, this will make git-annex sometimes pick information that
was written earlier in a second over information written later in the
same second. Usually git-annex does not write conflicting information,
but there are some cases where it could. Eg, storing an object on a remote
can update the remote state log with some state. If two repos both store the
same object, and end up storing different remote state for some reason,
this can result in one that ran a tiny bit later winning. Such a situation
seems unlikely to be user visible. And a small amount of clock skew could
already result in such things.

The only case I can think of where this might be a user visible change
is if a configuration command like git-annex numcopies is being run
in 2 clones of a repository on the same machine at very
close to the same time. Then the user will know which they ran last,
and git-annex won't.

If that did become a problem, this could be dialed back to eg log
milliseconds with still some space saving.
2023-12-11 15:04:06 -04:00
Joey Hess
86dbe9a825
migrate: support adding size back to URL keys
migrate: Support adding size to URL keys that were added with --relaxed, by
running eg: git-annex migrate --backend=URL foo

Since url keys cannot be generated, that used to fail. Make it notice that
the backend is not changed, and just get the size of the content.

Sponsored-by: Brock Spratlen on Patreon
2023-12-08 16:22:14 -04:00