Commit graph

1045 commits

Author SHA1 Message Date
Joey Hess
016d1bee88
add reregisterurl command
What this can currently be used for is only to change an url from being
used by a special remote to being used by the web remote.

This could have been a --move-from option to registerurl. But, that would
have complicated its option and --batch processing, and also would have
complicated unregisterurl, which is implemented on top of
Command.Registerurl. So, a separate command was actually less complicated
to implement.

The generic description of the command is because I want to make this
command a catch-all for other url updating kind of things, if there are
ever any more. Also because it was hard to come up with a good name for the
specific action. I considered `git-annex moveurl`, but that seems to
indicate data is perhaps actually being moved, and seems to sit at the same
level as addurl and rmurl, and this command is at the plumbing
level of registerurl and unregisterurl.

Sponsored-by: Dartmouth College's DANDI project
2024-03-05 15:06:14 -04:00
Joey Hess
e7652b0997
implement URL to VURL migration
This needs the content to be present in order to hash it. But it's not
possible for a module used by Backend.URL to call inAnnex because that
would entail a dependency loop. So instead, rely on the fact that
Command.Migrate calls inAnnex before performing a migration.

But, Command.ExamineKey calls fastMigrate and the key may or may not
exist, and it's not wanting to actually perform a migration in any case.
To handle that, had to add an additional value to fastMigrate to
indicate whether the content is inAnnex.

Factored generateEquivilantKey out of Remote.Web.

Note that migrateFromURLToVURL hardcodes use of the SHA256E backend.
It would have been difficult not to, given all the dependency loop
issues. But --backend and annex.backend are used to tell git-annex
migrate to use VURL in any case, so there's no config knob that
the user could expect to configure that.

Sponsored-by: Brock Spratlen on Patreon
2024-03-01 16:42:02 -04:00
Joey Hess
cc17ac423b
implement isCryptographicallySecureKey for VURL
Considerable difficulty to work around an import cycle. Had to move the
list of backends (except for VURL) to Backend.Variety to VURL could use
it.

Sponsored-by: Kevin Mueller on Patreon
2024-02-29 17:26:35 -04:00
Joey Hess
55bf01b788
add equivilant key log for VURL keys
When downloading a VURL from the web, make sure that the equivilant key
log is populated.

Unfortunately, this does not hash the content while it's being
downloaded from the web. There is not an interface in Backend currently
for incrementally hash generation, only for incremental verification of an
existing hash. So this might add a noticiable delay, and it has to show
a "(checksum...") message. This could stand to be improved.

But, that separate hashing step only has to happen on the first download
of new content from the web. Once the hash is known, the VURL key can have
its hash verified incrementally while downloading except when the
content in the web has changed. (Doesn't happen yet because
verifyKeyContentIncrementally is not implemented yet for VURL keys.)

Note that the equivilant key log file is formatted as a presence log.
This adds a tiny bit of overhead (eg "1 ") per line over just listing the
urls. The reason I chose to use that format is it seems possible that
there will need to be a way to remove an equivilant key at some point in
the future. I don't know why that would be necessary, but it seemed wise
to allow for the possibility.

Downloads of VURL keys from other special remotes that claim urls,
like bittorrent for example, does not popilate the equivilant key log.
So for now, no checksum verification will be done for those.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-02-29 16:01:49 -04:00
Joey Hess
c2d6c02c27
Added dependency on unbounded-delays
And stop vendoring part of it.

This is a free dependency because tasty depends on it.

Sponsored-by: Leon Schuermann on Patreon
2024-02-27 13:11:59 -04:00
Joey Hess
bee3abab14
releasing package git-annex version 10.20240227 2024-02-27 13:02:17 -04:00
Joey Hess
d61633e183
releasing package git-annex version 10.20240129 2024-01-29 14:12:12 -04:00
Joey Hess
812cbf0e17
Stateless OpenPGP interface
Implemented according to
https://www.ietf.org/archive/id/draft-dkg-openpgp-stateless-cli-09.html#name-encrypt-encrypt-a-message

Not yet used by git-annex.

Sponsored-by: Leon Schuermann on Patreon
2024-01-10 15:59:35 -04:00
Joey Hess
f3fa9dc65f
releasing package git-annex version 10.20231227 2023-12-27 19:27:55 -04:00
Joey Hess
8a3beabf35
use RawFilePath for opening sqlite databases
Fix a crash opening sqlite databases when run in a non-unicode locale,
with a remote that uses a non-unicode filepath. In that situation
converting to Text fails.

The fix needs git-annex to be built with persistent-sqlite 2.13.3.
Building against older versions still works, but that version is used when
building with stack.

Database.RawFilePath is a lot of code copied from persistent-sqlite and
lightly modified, since only 1 function in persistent-sqlite was made to
support RawFilePath. This is a bit of a pain, and I hope that
persistent-sqlite will eventually switch to using OsPath, allowing this
module to be removed from git-annex.

Sponsored-by: k0ld on Patreon
2023-12-26 18:31:52 -04:00
Joey Hess
0bd8b17b59
log migration trees to git-annex branch
This will allow distributed migration: Start a migration in one clone of
a repo, and then update other clones.

commitMigration is a bit of a bear.. There is some inversion of control
that needs some TMVars. Also streamLogFile's finalizer does not handle
recording the trees, so an interrupt at just the wrong time can cause
migration.log to be emptied but the git-annex branch not updated.

Sponsored-by: Graham Spencer on Patreon
2023-12-06 15:40:03 -04:00
Joey Hess
060259b750
add manual: true to ParallelBuild flag
hackage demands that -j be gated behind a build flag with manual: true
set.
2023-11-29 16:05:03 -04:00
Joey Hess
bacd781c4f
releasing package git-annex version 10.20231129 2023-11-29 16:01:01 -04:00
Joey Hess
cda3e85164
make my authorship explicit in the code
This is intended to guard against LLM code theft, which is the current
bubble technology de jour.

Note that authorJoeyHess' with a year older than the year I began
developing git-annex will behave badly, by intention. Eg, it will spin
and eventually crash.

This is not the first anti-LLM protection in git-annex. For example see
9562da790f. That method, while much harder
for an adversary to detect and remove, also complicates code somewhat
significantly, and needs extensions to be enabled. There are also
probably significantly fewer ways to implement that method in Haskell.
This new approach, by contrast, will be easy to add throughout the code
base, with very little effort, and without complicating reading or
maintaining it any more than noticing that yes, I am the author of this
code.

An adversary could of course remove all calls to these functions
before feeding code into their LLM-based laundry facility. I think this
would need to be done manually, or with the help of some fairly advanced
Haskell parsing though. In some cases, authorJoeyHess needs to be
removed, while in other places it needs to be replaced with a value.
Also a monadic use of authorJoeyHess' may involve other added monadic
machinery which would need to be eliminated to keep the code compiling.

Alternatively, an adversary could replace my name with something
innocuous. This would be clear intent to remove author attribution
from my code, even more than running it through an LLM laundry is.

If you work for a large company that is laundering my code through an
LLM, please do us a favor and use your immense privilege to quit and go
do something socially beneficial. I will not explain further
developments of this code in such detail, and you have better things to
do than playing cat and mouse with me as I explore directions such as
extending this approach to the type level.

Sponsored-by: k0ld on Patreon
2023-11-20 12:29:12 -04:00
Joey Hess
561c036664
split out generic git log parser
Sponsored-By: Jack Hill on Patreon
2023-11-10 15:40:03 -04:00
Joey Hess
8bde6101e3
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
2023-10-23 16:46:22 -04:00
Joey Hess
c2e60dd7a6
enable parallel ghc for building git-annex
Via a build flag this time, that's off by default because hackage
demands it be so, but that gets turned on by the Makefile and by stack.
2023-09-26 13:46:44 -04:00
Joey Hess
4ac2758ba5
Revert "enable parallel ghc for building git-annex"
This reverts commit 3f6aff89b1.

Sadly hackage rejects cabal files using -j unless hidden behind an
option that is disabled by default.
2023-09-26 13:34:28 -04:00
Joey Hess
b9240d2c5d
releasing package git-annex version 10.20230926 2023-09-26 13:29:49 -04:00
Joey Hess
3f6aff89b1
enable parallel ghc for building git-annex
This drops a full recompile on my new 12 core laptop from 4:00 to 2:47.

It would be possible for me to use:
cabal configure --ghc-options=-j
But that also makes cabal parallelize ghc for each package it installs
to satisfy git-annex's dependencies. Since cabal is already configured
to parallize installing dependencies, that would use N^2 cpu cores,
which seems like a bad idea.

And also I'd have to remember to do it.

So I'm thinking it's better to do it by default. If a system that is
building git-annex is also busy with other things, let the scheduler
sort it out. If this impacts someone particularly badly, they can of
course avoid it with:
cabal configure --ghc-options=-j1
2023-09-21 13:00:31 -04:00
Joey Hess
54da44d42a
Support being built with crypton rather than cryptonite
crypton is a fork of cryptonite, and cryptonite's github repo has been
archived. Some deps are already using cryptonite so it's clearly the way
forward.

Added a build flag without a default, so cabal configure will select on its
own which to use. stack files pin to cryptonite for now.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-09-21 12:43:42 -04:00
Joey Hess
50300a47fe
Removed the vendored git-lfs and the GitLfs build flag
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.

Debian stable now includes a new enough haskell-git-lfs package as well.
Last time this was tried it did not.
2023-08-28 13:12:31 -04:00
Joey Hess
5e818e4903
remove man pages from cabal file
Since 393275c105 Setup.hs no longer
installs the man pages. Since the cabal package is only used to install
git-annex with cabal, it doesn't need to include files like these that
are not used when installing with cabal.
2023-08-28 12:57:50 -04:00
Joey Hess
cf8b30c914
oldkeys: New command that lists the keys used by old versions of a file
The tricky thing about this turned out to be handling renames and reverts.
For that, it has to make two passes over the git log, and to avoid
buffering a possibly huge amount of logs in memory (ie the whole git log of
an entire repository!), runs git log twice.

(It might be possible to speed this up by asking git log to show a diff,
and so avoid needing to use catKey.)

Sponsored-By: Brock Spratlen on Patreon
2023-08-22 14:51:06 -04:00
Joey Hess
977403d338
implement Unavilable for borg bup ddar directory rsync
Only gcrypt remains to add support for. (Well, possibly also adb?)

Sponsored-by: Luke T. Shumaker on Patreon
2023-08-16 15:48:09 -04:00
Joey Hess
be028f10e5
split out Utility.Url.Parse
This is mostly for git-repair which can't include all of Utility.Url
without adding many dependencies that are not really necessary.
2023-08-14 12:28:10 -04:00
Joey Hess
d19139a10d
releasing package git-annex version 10.20230802 2023-08-02 16:09:14 -04:00
Joey Hess
85aadcfa1e
windows back to lts-18.13 temporarily
I can't seem to get stack to resolve dependencies with Win32-2.13.4.0,
no matter what I try. Why it blows up, I don't know.

And allow-newer: true actually causes it to downgrade Win32 to the one
version that won't build. Unbelivable that allows downgrades.

So just gonna have to wait for that to get into stackage nightly, and
then stack.yaml can be updated to use that, and the changes in this
commit reverted.
2023-08-02 12:49:38 -04:00
Joey Hess
f1842b616a
fix stack build on windows
For whatever reason, putting Win32-2.13.4.0 in stack.yaml results in
stack blowing up with many unrelated dependency problems.
But making git-annex depend on that version lets stack resolve deps.
2023-08-02 11:50:12 -04:00
Joey Hess
28864f0bb2
add back utf8-string to setup build deps
Needed on Windows since Utility.FileSystemEncoding uses it
2023-08-02 09:29:23 -04:00
Joey Hess
6da6449fff
stack.yaml: Update to build with ghc-9.6.2 and aws-0.24
This enables some new features that need the new aws.

Use http-client-restricted-0.1.0 because it uses the crypton side of the
cryptonite/crypton fork, which seems to be needed for ghc-9.6.2.

Dependency on connection removed because of the cryptonite/crypton fork.
This avoids needing a build flag. It was only used to throw a typed
exception in Utility.Url, which nothing depended on.

Used a fork of bloomfilter because it's not being maintained and no longer
builds as-of this ghc version. (I have been trying to contact its
maintainer about it, and emailed him today suggesting I take over the
package.)

Sponsored-by: Brock Spratlen on Patreon
2023-08-01 18:53:26 -04:00
Joey Hess
68c9b08faf
fix build with unix-2.8.0
Changed the parameters to openFd. So needed to add a small wrapper
library to keep supporting older versions as well.
2023-08-01 18:41:27 -04:00
Joey Hess
3b825eb7a6
rewrap 2023-08-01 15:47:05 -04:00
Joey Hess
fb640bc2f4
support building with unix-compat 0.7
It removed System.PosixCompat.User.
2023-08-01 15:17:43 -04:00
Joey Hess
393275c105
Setup.hs: Stop installing man pages, desktop files, and the git-annex-shell and git-remote-tor-annex symlinks
Anything still relying on that, eg via cabal v1-install will need to
change to using make install-home. Which was added back in 2019 in
6491b62614 because cabal new-build
(now the default) already didn't use Setup in a way that let its
installation of those things work.

Notably this means Setup does not need to depend on unix-compat, which is
useful because in 0.7 it removed System.PosixCompat.User, which Setup
needed to determine where to install the desktop files. See
https://github.com/haskell-pkg-janitors/unix-compat/issues/3
2023-08-01 15:08:56 -04:00
Joey Hess
e1fc9e204e
added git-annex satisfy
This ended up having an interface like sync, rather than like get/copy/drop.
That let it be implemented in terms of sync, which took a lot less code.
Also, it lets it handle many of the edge cases that sync does, such as
getting files that are not visible in a --hide-missing branch, and sending
files to exporttree remotes.

As well as being easier to implement, `git-annex satisfy myremote` makes
sense as it satisfies the preferred content settings of the remote.
`git-annex satisfy somefile` does not form a sentence that makes sense. So
while -C can be a little bit annoying, it still makes sense to have this
syntax.

Note that, while I initially thought this would also satisfy numcopies, it
does not. Arguably it ought to. But, sync does not send files in order to
satisfy numcopies, it only sends files to satisfy preferred content. And
it's important that this transfer the same files as sync does, because
it will probably be used in a workflow where the user sometimes syncs and
sometimes satisfies, and does not expect satisfy to do things that sync
would not do.

(Also opened a new bug that also affects sync et all, not only this command.)

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-06-29 15:34:53 -04:00
Joey Hess
1b9958f4fd
document git-annex satisfy 2023-06-29 14:15:01 -04:00
Joey Hess
a8779f4c2a
prep release 2023-06-26 10:41:36 -04:00
Joey Hess
2b2ec8fa63
lower optparse-applicative bounds after recent bump
0.14.2 included H.pretty. I tested with 0.16.1 and it displays ok using
it.
2023-06-21 12:51:45 -04:00
Peter Simons
ffb708be09
Adapt code to optparse-applicative 0.18.1 and later.
optparse-applicative switched to the 'prettyprinter' library in its latest
release, which means the 'H.text' function has disappeared. Instead, 'H.pretty'
can be used to convert all 'Pretty a' types into a renderable document.
2023-06-21 11:51:04 -04:00
Joey Hess
6821ba8dab
sync: use log to track adjusted branch needs updating
Speeds up sync in an adjusted branch by avoiding re-adjusting the branch
unncessarily, particularly when it is adjusted with --hide-missing or
--unlock-present.

When there are a lot of files, that was the majority of the time of a
--no-content sync.

Uses a log file, which is updated when content presence changes. This
adds a little bit of overhead to every file get/drop when on such an
adjusted branch. The overhead is minimal for get of any size of file,
but might be noticable for drop in some cases. It seems like a reasonable
trade-off. It would be possible to update the log file only at the end, but
then it would not happen if the command is interrupted.

When not in an adjusted branch, there should be no additional overhead.
(getCurrentBranch is an MVar read, and it avoids the MVar read of
getGitConfig.)

Note that this does not deal with situations such as:
git checkout master, git-annex get, git checkout adjusted branch,
git-annex sync. The sync won't know that the adjusted branch needs to be
updated. Dealing with that would add overhead to operation in non-adjusted
branches, which I don't like. Also, there are other situations like having
two adjusted branches that both need to be updated like this, and switching
between them and sync not updating.

This does mean a behavior change to sync, since it did previously deal
with those situations. But, the documentation did not say that it did.
The man pages only talk about sync updating the adjusted branch after
it transfers content.

I did consider making sync keep track of content it transferred (and
dropped) and only update the adjusted branch then, not to catch up to other
changes made previously. That would perform better. But it seemed rather
hard to implement, and also it would have problems with races with a
concurrent get/drop, which this implementation avoids.

And it seemed pretty likely someone had gotten used to get/drop followed by
sync updating the branch. It seems much less likely someone is switching
branches, doing get/drop, and then switching back and expecting sync to update
the branch.

Re-running git-annex adjust still does a full re-adjusting of the branch,
for anyone who needs that.

Sponsored-by: Leon Schuermann on Patreon
2023-06-08 14:35:41 -04:00
Joey Hess
c6acf574c7
implement importChanges optimisaton (not used yet)
For simplicity, I've not tried to make it handle History yet, so when
there is a history, a full import will still be done. Probably the right
way to handle history is to first diff from the current tree to the last
imported tree. Then, diff from the current tree to each of the
historical trees, and recurse through the history diffing from child tree
to parent tree.

I don't think that will need a record of the previously imported
historical trees, and so Logs.Import doesn't store them. Although I did
leave room for future expansion in that log just in case.

Next step will be to change importTree to importChanges and modify
recordImportTree et all to handle it, by using adjustTree.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-31 16:01:34 -04:00
Joey Hess
e955912ad0
git-annex assist
assist: New command, which is the same as git-annex sync but with
new files added and content transferred by default.

(Also this fixes another reversion in git-annex sync,
--commit --no-commit, and --message were not enabled, oops.)

See added comment for why git-annex assist does commit staged
changes elsewhere in the work tree, but only adds files under
the cwd.

Note that it does not support --no-commit, --no-push, --no-pull
like sync does. My thinking is, why should it? If you want that
level of control, use git commit, git annex push, git annex pull.
Sync only got those options because pull and push were not split
out.

Sponsored-by: k0ld on Patreon
2023-05-18 14:37:43 -04:00
Joey Hess
80e9a655f8
add man pages for pull and push to cabal file 2023-05-18 12:54:15 -04:00
Joey Hess
5df89d58c7
git-annex pull and push
Split out two new commands, git-annex pull and git-annex push. Those plus a
git commit are equivilant to git-annex sync.

In a sense, git-annex sync conflates 3 things, and it would have been
better to have push and pull from the beginning and not sync. Although
note that git-annex sync --content is faster than a pull followed by a
push, because it only has to walk the tree once, look at preferred
content once, etc. So there is some value in git-annex sync in speed, as
well as user convenience.

And it would be hard to split out pull and push from sync, as far as the
implementaton goes. The implementation inside sync was easy, just adjust
SyncOptions so it does the right thing.

Note that the new commands default to syncing content, unless
annex.synccontent is explicitly set to false. I'd like sync to also do
that, but that's a hard transition to make. As a start to that
transition, I added a note to git-annex-sync.mdwn that it may start to
do so in a future version of git-annex. But a real transition would
necessarily involve displaying warnings when sync is used without
--content, and time.

Sponsored-by: Kevin Mueller on Patreon
2023-05-16 16:51:07 -04:00
Joey Hess
9155ed1072
configremote
New command, currently limited to changing autoenable= setting of a special remote.

It will probably never be used for more than that given the limitations on
it.

Sponsored-by: Brock Spratlen on Patreon
2023-04-18 15:30:49 -04:00
Joey Hess
fe5e586b72
rename Git.Filename to Git.Quote 2023-04-12 17:22:03 -04:00
Joey Hess
a576fc3b12
fix mojibake reversion in display of utf8
When displaying a ByteString like "💕", safeOutput operates on
individual bytes like "\240\159\146\149" and isControl '\146' = True,
so it got truncated to just "\240".

So, only treat the low control characters, and DEL, as control
characters.

Also split Utility.Terminal out of Utility.SafeOutput. The latter needs
win32, but Utility.SafeOutput is used by Control.Exception, which is
used by Setup.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-12 13:53:30 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Joey Hess
9c242af171
releasing package git-annex version 10.20230407 2023-04-07 13:37:03 -04:00