This works well, and it interoperates with gpg in my testing (although some
SOP commands might choose to use a profile that does not so caveat emptor).
Note that for creating the Cipher, gpg --gen-random is still used. SOP
does not have an eqivilant, and as long as the user has gpg around,
which seems likely, it doesn't matter that it uses gpg here, it's not being
used for encryption. That seemed better than implementing a second way
to get high quality entropy, at least for now.
The need for the sop command to run in an empty directory has each call
to encrypt and decrypt creating a new temporary directory. That is some
unncessary overhead, though probably swamped by the overhead of running
the sop command. This could be improved in the future by passing an
already empty directory to them, or a sufficiently empty directory
(.git/annex/tmp would probably suffice).
Sponsored-by: Brett Eisenberg on Patreon
This avoids a hang approximately 1% of the time when running the test
suite on StatelessOpenPGP.
Since I've not seen git-annex hang when running git like that, I guess
git probably does something that avoids hanging similarly. Still, fixed
the same problem in Utility.Gpg too.
Sponsored-by: Kevin Mueller on Patreon
Test a specified Stateless OpenPGP command with eg:
git-annex test --test-git-config annex.shared-sop-command=sqop
Also documented that config and another one, but so far only the test suite
uses the configs, have not yet implemented using it for actual symmetric
encryption.
Sponsored-by: Joshua Antonishen on Patreon
This aims to future-proof gpg key generation. OpenPGP is in flux with a
conflict over standards ongoing. It seems not unlikely that different
systems will have different gpg commands that support different algorithms.
This also simplifies the code by using the --quick-gen-key interface rather
than the experimental batch interface. It seems less likely that
--quick-gen-key will break than an experimental interface (whose
documentation I can no longer find).
--quick-gen-key is supported since gpg 2.1.0 (2014).
Sponsored-by: Graham Spencer on Patreon
When importing from a special remote, support preferred content expressions
that use terms that match on keys (eg "present", "copies=1"). Such terms
are ignored when importing, since the key is not known yet.
When "standard" or "groupwanted" is used, the terms in those
expressions also get pruned accordingly.
This does allow setting preferred content to "not (copies=1)" to make a
special remote into a "source" type of repository. Importing from it will
import all files. Then exporting to it will drop all files from it.
In the case of setting preferred content to "present", it's pruned on
import, so everything gets imported from it. Then on export, it's applied,
and everything in it is left on it, and no new content is exported to it.
Since the old behavior on these preferred content expressions was for
importtree to error out, there's no backwards compatability to worry about.
Except that sync/pull/etc will now import where before it errored out.
This can reduce the size of the branch by up to 8%. My test was
running git-annex add 1000 times on one file each.
Lots of different high-resolution timestamps were recorded before
and eliminating those, after packing, the git repo was 8% smaller.
Due to the use of vector clocks, high resolution timestamps are
not necessary to make clear which information is most recent when
eg, a value is changed repeatedly in the same second. In such a
case, the vector clock will be advanced to the next second after
the last modification. For example, running
git-annex numcopies 1; git-annex numcopies 2
The first will record the current second, while the next records
the second after that even if it runs in the same second.
As for conflicting information written to two different clones of the
repository, this will make git-annex sometimes pick information that
was written earlier in a second over information written later in the
same second. Usually git-annex does not write conflicting information,
but there are some cases where it could. Eg, storing an object on a remote
can update the remote state log with some state. If two repos both store the
same object, and end up storing different remote state for some reason,
this can result in one that ran a tiny bit later winning. Such a situation
seems unlikely to be user visible. And a small amount of clock skew could
already result in such things.
The only case I can think of where this might be a user visible change
is if a configuration command like git-annex numcopies is being run
in 2 clones of a repository on the same machine at very
close to the same time. Then the user will know which they ran last,
and git-annex won't.
If that did become a problem, this could be dialed back to eg log
milliseconds with still some space saving.
This will allow distributed migration: Start a migration in one clone of
a repo, and then update other clones.
commitMigration is a bit of a bear.. There is some inversion of control
that needs some TMVars. Also streamLogFile's finalizer does not handle
recording the trees, so an interrupt at just the wrong time can cause
migration.log to be emptied but the git-annex branch not updated.
Sponsored-by: Graham Spencer on Patreon
Avoid a problem with temp file names ending in "." on certian filesystems
that have problems with such filenames.
relatedTemplate is quite an ugly hack really; since it doesn't know the max
filename length of the filesystem it can only assume that the filename is
max allowed length. When given the input "lh.aparc.DKTatlas.annot", it
wants to reserve 20 characters for tempfile so it truncates to "lh.". That
ending period is apparently a problem on some filesystem (FAT eats it, but
does not throw EINVAL; ntfs does not seem bothered by it, I don't know what
FUSE filesystem the bug reporter was really using).
Sponsored-by: Brett Eisenberg on Patreon
Split out an author parameter, will make it easier to add authors and
reads better.
Got rid of the function without the copyright year, because an adversary
could have mechanically changed the function with a copyright year to
the one without, and so bypassed the protection of LLM copyright
year hallucination.
Sponsored-by: Luke T. Shumaker on Patreon
This is intended to guard against LLM code theft, which is the current
bubble technology de jour.
Note that authorJoeyHess' with a year older than the year I began
developing git-annex will behave badly, by intention. Eg, it will spin
and eventually crash.
This is not the first anti-LLM protection in git-annex. For example see
9562da790f. That method, while much harder
for an adversary to detect and remove, also complicates code somewhat
significantly, and needs extensions to be enabled. There are also
probably significantly fewer ways to implement that method in Haskell.
This new approach, by contrast, will be easy to add throughout the code
base, with very little effort, and without complicating reading or
maintaining it any more than noticing that yes, I am the author of this
code.
An adversary could of course remove all calls to these functions
before feeding code into their LLM-based laundry facility. I think this
would need to be done manually, or with the help of some fairly advanced
Haskell parsing though. In some cases, authorJoeyHess needs to be
removed, while in other places it needs to be replaced with a value.
Also a monadic use of authorJoeyHess' may involve other added monadic
machinery which would need to be eliminated to keep the code compiling.
Alternatively, an adversary could replace my name with something
innocuous. This would be clear intent to remove author attribution
from my code, even more than running it through an LLM laundry is.
If you work for a large company that is laundering my code through an
LLM, please do us a favor and use your immense privilege to quit and go
do something socially beneficial. I will not explain further
developments of this code in such detail, and you have better things to
do than playing cat and mouse with me as I explore directions such as
extending this approach to the type level.
Sponsored-by: k0ld on Patreon
This allows getting rid of the ugly and error prone handling of
"bag of bytes" String in Remote.Helper.Encryptable.
Avoiding breakage like that dealt with by commit
9862d64bf9
And allows converting Utility.Gpg to use ByteString for IO, which is
a welcome change.
Tested the new git-annex interoperability with old, using all 3
encryption= types.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode
characters. Luckily, genUUIDInNameSpace is only ever used on ASCII
strings as far as I can determine. In particular, git-remote-gcrypt's
gcrypt-id is an ASCII string.
Note the use of fromString and toString from Data.ByteString.UTF8 dated
back to commit 9b93278e8a. Back then it
was using the dataenc package for base64, which operated on Word8 and
String. But with the switch to sandi, it uses ByteString, and indeed
fromB64' and toB64' were already using ByteString without that
complication. So I think there is no risk of such an encoding related
breakage.
I also tested the case that 9b93278e8a
fixed:
git-annex metadata -s foo='a …' x
git-annex metadata x
metadata x
foo=a …
In Remote.Helper.Encryptable, it was avoiding using Utility.Base64
because of that UTF8 conversion. Since that's no longer done, it can
just use it now.
The crash occurred because writeCreds got called twice, and writeFileProtected
neglected to close its file handle, so the file was open for write when
written the second time.
It seems unncessary and suboptimal that writeCreds gets called twice.
One call is from getRemoteCredPair and the other from setRemoteCredPair'.
What happens is that in the enableremote case, code that also runs at
initremote does unncessary work. Might be possible to improve that, but
I've gone for the simple fix.
Sponsored-by: k0ld on Patreon
crypton is a fork of cryptonite, and cryptonite's github repo has been
archived. Some deps are already using cryptonite so it's clearly the way
forward.
Added a build flag without a default, so cabal configure will select on its
own which to use. stack files pin to cryptonite for now.
Sponsored-by: Nicholas Golder-Manning on Patreon
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.
Debian stable now includes a new enough haskell-git-lfs package as well.
Last time this was tried it did not.
Fix behavior when importing a tree from a directory remote when the
directory does not exist. An empty tree was imported, rather than the
import failing. Merging that tree would delete every file in the
branch, if those files had been exported to the directory before.
The problem was that dirContentsRecursive returned [] when the directory
did not exist. Better for it to throw an exception. But in commit
74f0d67aa3 back in 2012, I made it never
theow exceptions, because exceptions throw inside unsafeInterleaveIO become
untrappable when the list is being traversed.
So, changed it to list the contents of the directory before entering
unsafeInterleaveIO. So exceptions are thrown for the directory. But still
not if it's unable to list the contents of a subdirectory. That's less of a
problem, because the subdirectory does exist (or if not, it got removed
after being listed, and it's ok to not include it in the list). A
subdirectory that has permissions that don't allow listing it will have its
contents omitted from the list still.
(Might be better to have it return a type that includes indications of
errors listing contents of subdirectories?)
The rest of the changes are making callers of dirContentsRecursive
use emptyWhenDoesNotExist when they relied on the behavior of it not
throwing an exception when the directory does not exist. Note that
it's possible some callers of dirContentsRecursive that used to ignore
permissions problems listing a directory will now start throwing exceptions
on them.
The fix to the directory special remote consisted of not making its
call in listImportableContentsM use emptyWhenDoesNotExist. So it will
throw an exception as desired.
Sponsored-by: Joshua Antonishen on Patreon
Significant startup speed increase by avoiding repeatedly checking if some
remote git-annex branch refs need to be merged when it is not newer.
One way this could happen is when there are 2 remotes that are themselves
connected. The git-annex branch on the first remote gets updated. Then the
second remote pulls from the first, and merges in its git-annex branch.
Then the local repo pulls from the second remote, and merges its git-annex
branch. At this point, a pull from the first remote will get a git-annex
branch that is not newer, but is not on the merged refs list.
In my big repo, git-annex startup time dropped from 4 seconds to 0.1 seconds.
There were 5 to 10 such remote refs out of 18 remotes.
Sponsored-by: Graham Spencer on Patreon
I can't seem to get stack to resolve dependencies with Win32-2.13.4.0,
no matter what I try. Why it blows up, I don't know.
And allow-newer: true actually causes it to downgrade Win32 to the one
version that won't build. Unbelivable that allows downgrades.
So just gonna have to wait for that to get into stackage nightly, and
then stack.yaml can be updated to use that, and the changes in this
commit reverted.
Seems that while the module is not imported by anything on windows, it
still gets cpped, and MIN_VERSION_unix is not defined so it failed to
preprocess.
It made UserInfo into a pattern to discourage manually constructing
them, so just to use UserInfo in a type signature of a function that
consumes them, have to import the new ByteString module.
And annex.largefiles and annex.addunlocked.
Also git-annex matchexpression --explain explains why its input
expression matches or fails to match.
When there is no limit, avoid explaining why the lack of limit
matches. This is also done when no preferred content expression is set,
although in a few cases it defaults to a non-empty matcher, which will
be explained.
Sponsored-by: Dartmouth College's DANDI project
Currently it only displays explanations of options like --in and --copies.
In the future, it should explain preferred content expression evaluation
and other decisions.
The explanations of a few things could be better. In particular,
"standard" will just appear as-is (or as "!standard" if it doesn't
match), rather than explaining why the standard preferred content expression
for the group matches or not.
Currently as implemented, it goes to stdout, and so commands like
git-annex find that have custom output will not display --explain
information. Perhaps that should change, dunno.
Sponsored-by: Dartmouth College's DANDI project