Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS
is complete, when a PUT stores to additional repositories
than the expected on, the location log is updated with the
additional UUIDs that contain the content.
Started implementing PUT fanout to multiple remotes for clusters.
It is untested, and I fear fencepost errors in the relative
offset calculations. And it is missing proxying for the protocol
after DATA.
An oversight..
And with the work in progress proxy and cluster, there
can be additional remotes that are not listed in .git/config, but are
available. Making those more discoverable is another big benefit of
this.
One benefit of this is that a typo in annex-cluster-node config won't
init a new cluster.
Also it gets the cluster description set and is consistent with
initremote.
Fix a bug where interrupting git-annex while it is updating the git-annex
branch could lead to git fsck complaining about missing tree objects.
Interrupting git-annex while regraftexports is running in a transition
that is forgetting git-annex branch history would leave the
repository with a git-annex branch that did not contain the tree shas
listed in export.log. That lets those trees be garbage collected.
A subsequent run of the same transition then regrafts the trees listed
in export.log into the git-annex branch. But those trees have been lost.
Note that both sides of `if neednewlocalbranch` are atomic now. I had
thought only the True side needed to be, but I do think there may be
cases where the False side needs to be as well.
Sponsored-by: Dartmouth College's OpenNeuro project
When building an adjusted unlocked branch, make pointer files executable
when the annex object file is executable.
This slows down git-annex adjust --unlock/--unlock-present by needing to
stat all annex object files in the tree. Probably not a significant
slowdown compared to other work they do, but I have not benchmarked.
I chose to leave git-annex adjust --unlock marked as stable, even though
get or drop of an object file can change whether it would make the pointer
file executable. Partly because making it unstable would slow down
re-adjustment, and partly for symmetry with the handling of an unlocked
pointer file that is executable when the content is dropped, which does not
remove its execute bit.
fsck --fast was intended to disable checksumming, but checksumming is done
after transfers too. Due to the check being in the non-incremental path,
it would only affect non-incremental checksumming during a transfer,
and I'm not 100% sure that it was a problem.
Also, when using an external backend that does checksumming, fsck --fast
didn't disable it and now does.
Update its todo with remaining items.
Add changelog entry.
Simplified internals document to no longer be notes to myself, but
target users who want to understand how the data is stored
and might want to extract these repos manually.
Sponsored-by: Kevin Mueller on Patreon
Added rclone special remote, which can be used without needing to install
the git-annex-remote-rclone program. This needs a new version of rclone,
which supports "rclone gitannex".
This is implemented as a variant of an external special remote, that
runs "rclone gitannex" instead of the usual git-annex-remote- command.
Parameterized Remote.External to support that.
Sponsored-by: Luke T. Shumaker on Patreon
Test suite passes this time. When committing the adjusted branch, use
the old method to make a message that old git-annex can consume. Also
made the code accept the new message, so that eventually
commitTreeExactMessage can be removed.
Sponsored-by: Kevin Mueller on Patreon
This reverts commit cee12f6a2f.
This commit broke git-annex init run in a repo that was cloned from a
repo with an adjusted branch checked out.
The problem is that findAdjustingCommit was not able to identify the
commit that created the adjusted branch. It seems that there is an extra
"\n" at the end of the commit message that it does not expect.
Since backwards compatability needs to be maintained, cannot just make
findAdjustingCommit accept it with the "\n". Will have to instead
have one commitTree variant that uses the old method, and use it for
adjusted branch committing.
sync, assist, import: Allow -m option to be specified multiple times, to
provide additional paragraphs for the commit message.
The option parser didn't allow multiple -m before, so there is no risk of
behavior change breaking something that was for some reason using multiple
-m already.
Pass through to git commands, so that the method used to assemble the
paragrahs is whatever git does. Which might conceivably change in the
future.
Note that git commit-tree has supported -m since git 1.7.7. commitTree
was probably not using it since it predates that version. Since the
configure script prevents building git-annex with git older than 2.1,
there is no risk that it's not supported now.
Sponsored-by: Nicholas Golder-Manning on Patreon
While redundant concurrent transfers were already prevented in most
cases, it failed to prevent the case where two different repositories were
sending the same content to the same repository. By removing the uuid
from the transfer lock file for Download transfers, one repository
sending content will block the other one from also sending the same
content.
In order to interoperate with old git-annex, the old lock file is still
locked, as well as locking the new one. That added a lot of extra code
and work, and the plan is to eventually stop locking the old lock file,
at some point in time when an old git-annex process is unlikely to be
running at the same time.
Note that in the case of 2 repositories both doing eg
`git-annex copy foo --to origin`
the output is not that great:
copy b (to origin...)
transfer already in progress, or unable to take transfer lock
git-annex: transfer already in progress, or unable to take transfer lock
97% 966.81 MiB 534 GiB/s 0sp2pstdio: 1 failed
Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe))
Transfer failed
Perhaps that output could be cleaned up? Anyway, it's a lot better than letting
the redundant transfer happen and then failing with an obscure error about
a temp file, which is what it did before. And it seems users don't often
try to do this, since nobody ever reported this bug to me before.
(The "97%" there is actually how far along the *other* transfer is.)
Sponsored-by: Joshua Antonishen on Patreon
What this can currently be used for is only to change an url from being
used by a special remote to being used by the web remote.
This could have been a --move-from option to registerurl. But, that would
have complicated its option and --batch processing, and also would have
complicated unregisterurl, which is implemented on top of
Command.Registerurl. So, a separate command was actually less complicated
to implement.
The generic description of the command is because I want to make this
command a catch-all for other url updating kind of things, if there are
ever any more. Also because it was hard to come up with a good name for the
specific action. I considered `git-annex moveurl`, but that seems to
indicate data is perhaps actually being moved, and seems to sit at the same
level as addurl and rmurl, and this command is at the plumbing
level of registerurl and unregisterurl.
Sponsored-by: Dartmouth College's DANDI project
Not yet implemented is recording hashes on download from web and
verifying hashes.
addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.
It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.
Sponsored-by: Joshua Antonishen on Patreon
Notice a warning with -J2 causing git-annex progress output to get slightly
messed up.
Error output would also probably do that, so perhaps it should capture
stderr and only display it when yt-dlp exited nonzero?
This option might also make sense for youtube-dl, I don't have an
installation handy anymore to check.
Except when a commit is made in a view, which changes metadata.
Make the assistant commit the git-annex branch after git commit of working
tree changes.
This allows using the annex.commitmessage-command in the assistant to
generate a commit message for the git-annex branch that relies on state
gathered during the commit of the working tree. Eg, it might reuse the
commit message.
Note that, when not using the assistant, a git-annex add still commits
the git-annex branch, so such a annex.commitmessage-command set up would
not work then. But if someone is using the assistant and wants
programmatic control over commit messages, this is useful. Someone not
using the assistant can get the same result by using annex.alwayscommit=false
during the git-annex add, and git-annex merge after they git commit.
pre-commit was never really intended to commit the git-annex branch
(except after recording changed metadata), but the assistant did sort of
rely on it. It does later commit the git-annex branch before pushing to
remotes, but I didn't want to risk building up lots of uncommitted changes
to it if that didn't happen frequently.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Was doing a Git.Branch.commit for historical reasons to do with direct
mode, which no longer apply.
Note that the preCommitAnnexHook is no longer called in commitStaged
because git-annex installs a pre-commit hook that runs the pre-commit-annex
hook. And git commit will run the pre-commit hook.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
--raw-except=web allows using yt-dlp but not any other special remotes.
Currently this option can only be used once, trying to use it repeatedly
will make option parsing fail. Perhaps it ought to support being used more
than once, but it seemed like an unlikely use case to need that.
Note that getParsed is called repeatedly when the option is used with
several urls. While implementing DeferredParseClass would avoid that
innefficiency, it didn't seem worth the added boilerplate since
getParsed only calls byNameWithUUID which does minimal work.
Sponsored-by: Dartmouth College's DANDI project
importfeed --force: Don't treat it as a failure when an already downloaded
file exists. (Fixes a behavior change introduced in 10.20230626.)
04ee6c4c6b caused the reversion. Inside a CommandPerform, stop causes it
to fail. Before that commit, it was inside a CommandStart, where stop
causes it to skip.
Which uses yt-dlp to screen scrape the equivilant of an RSS feed.
Note that youtubedlscraped is a speed optimisation. Since yt-dlp found
the urls, we know it can download them. That avoids calling
youtubeDlSupported on each url, which makes --fast a lot faster.
Almost all the same metadata fields and file formatting fields are
populated, when yt-dlp is able to get the data. Note that yt-dlp has some
additional useful metadata that could be exposed. But, much of it is
specific to particular websites, and it would be hard to document on the
git-annex importfeed man page.
Sponsored-by: unqueued on Patreon