Actually very straightforward reuse of the metadata log file code.
Although I had to add a todo item as git-annex forget won't clean up
dead remote's metadata yet.
This would be worth adding to the external special remote interface
sometime. Have not opened a todo though, guess I'll wait until something
needs it.
This commit was supported by the NSF-funded DataLad project.
Have to store the S3 object along with the version ID, so retrieval can
use the same object.
This commit was supported by the NSF-funded DataLad project.
Only done when versioning=yes is configured. It could always do it when
S3 sends back a version id, but there may be buckets that have
versioning enabled by accident, so it seemed better to honor the
configuration.
S3's docs say version IDs are "randomly generated", so presumably
storing the same content twice gets two different ones not the same one.
So I considered storing a list of version IDs for a key. That would
allow removing the key completely. But.. The way Logs.RemoteState works,
when there are multiple writers, the last writer wins. So storing a list
would need a different log format that merges, which seemed overkill to support
removing a key from an append-only remote.
Note that Logs.RemoteState for S3 is now dedicated to version IDs.
If something else needs to be stored, a new log will be needed to do it.
This commit was supported by the NSF-funded DataLad project.
Make exporttree=yes remotes that are appendonly not be untrusted, and not force
verification of content, since the usual concerns about losing data when an
export is updated by someone else don't apply.
Note that all the remote operations on keys are left as usual for
appendonly export remotes, except for storing content.
This commit was supported by the NSF-funded DataLad project.
Make git-annex sync and the assistant skip trying to drop from appendonly
remotes since it's just going to fail.
git-annex drop and similar commands will still try to drop from
appendonly, so the user will see failure messages when they try to do
that. To do otherwise would be confusing since the user has explicitly
asked for a drop with those commands.
This commit was supported by the NSF-funded DataLad project.
Make `git annex export` check appendonly when removing a file from an
export, and not update the location log, since the remote still contains
the content.
This commit was supported by the NSF-funded DataLad project.
Does nothing yet.
Considered making bup readonly, but while the content can't be removed,
it is able to delete a branch, so didn't.
This commit was supported by the NSF-funded DataLad project.
In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external
hashers". Re-benchmarking today, I found cryptonite's sha256 consistently
outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb
files with both sha256 and sha512. And for smaller files, the external
process startup time swamps the hash time.
Perhaps cryptonite has improved. Or it could just do better on my
current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite
is slower in some situations, seems likely it would only be marginally slower;
it's got the same class of highly optimised C code under the hood as coreutils.
The main difference between the two sha256 implementations seems to be
how much of the inner loop they unroll..
This commit was sponsored by Henrik Riomar on Patreon.
Probably not noticed until now because the queue is large enough that two
threads each filling theirs at the same time and flushing is unlikely to
happen.
Also made explicit that each worker thread gets its own queue.
I think that was the case before, but if something was put in the queue
before worker threads were forked off, they could have each inherited the
same queue.
Could have gone with a single shared queue, but per-worker queues is more
efficient, because a worker can add lots of stuff to its own queue without
any locking.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Avoids annex.largefiles inconsitency and also avoids a lot of
unneccessary calls to the clean filter when a large repo's clone
is being initialized.
This commit was supported by the NSF-funded DataLad project.
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.
Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.
Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.
This commit was supported by the NSF-funded DataLad project.