A benchmark in my sound repository with `git-annex view feedtitle=*`
took 2:52 wall clock time before and 1:58 after. Though it still only used
130% of CPU.
This is the same kind of optimisation that is in seekFilteredKeys, though
that precaches location logs while this streams the metadata logs direct
to parsing them.
seekFilteredKeys contains more streaming, to find the annexed files, and
this could be further sped up with similar streaming.
Sponsored-by: Nicholas Golder-Manning on Patreon
sync: Fix a bug that caused files to be removed from an importtree=yes
exporttree=yes special remote when the remote's annex-tracking-branch was
not the currently checked out branch.
Sponsored-by: Max Thoursie on Patreon
* sync: When run in a view branch, refresh the view branch to reflect any
changes that have been made to the parent branch or metadata.
This is basically working, but probably needs some more work to deal with
all the edge cases of things sync does.
Sponsored-by: Lawrence Brogan on Patreon
* sync: Avoid pushing view branches to remotes.
* Changed the name of view branches to include the parent branch.
Existing view branches checked out using an old name will still work.
It does not seem useful for sync to push view branches around, because
the information in a view branch can entirely be derived from other
information in git. And sync doesn't push adjusted branches around either.
The better view branch names make it more in line with adjusted branch
names, but were also needed to make fromViewBranch be able to return the
original branch name.
Kept the old view branch names still working. But, when those branches
exist in a repo, sync will still try to push them as before. Avoiding
that would need more complicated and/or expensive changes to sync.
Sponsored-By: Boyd Stephen Smith Jr. on Patreon
* view: New field?=glob and ?tag syntax that includes a directory "_"
in the view for files that do not have the specified metadata set.
* Added annex.viewunsetdirectory git config to change the name of the
"_" directory in a view.
When in a view using the new syntax, old git-annex will fail to parse the
view log. It errors with "Not in a view.", which is not ideal. But that
only affects view commands.
annex.viewunsetdirectory is included in the View for a couple of reasons.
One is to avoid needing to warn the user that it should not be changed when
in a view, since that would confuse git-annex. Another reason is that it
helped with plumbing the value through to some pure functions.
annex.viewunsetdirectory is actually mangled the same as any other view
directory. So if it's configured to something like "N/A", there won't be
multiple levels of directories, which would also confuse git-annex.
Sponsored-By: Jack Hill on Patreon
S3: Support a region= configuration useful for some non-Amazon S3
implementations. This feature needs git-annex to be built with aws-0.24.
datacenter= sets both the AWS hostname and region in one setting, which is
easy when using AWS, but not useful for other hosts. So kept datacenter
as-is, but added this additional config.
Sponsored-By: Brett Eisenberg on Patreon
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!
The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.
Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.
Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.
Sponsored-by: Dartmouth College's DANDI project
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.
Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.
A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.
Also, cleaned up comments on git-annex-find man page asking for --all
option.
Sponsored-by: Dartmouth College's DANDI project
Note that when this is specified and an older git-annex is used to
enableremote such a special remote, it will simply ignore the cost= field
and use whatever the default cost is.
In passing, fixed adb to support the remote.name.cost and
remote.name.cost-command configs.
Sponsored-by: Dartmouth College's DANDI project
Allow initremote of additional special remotes with type=web, in addition
to the default web special remote.
When --sameas=web is used, these provide additional names for the web
special remote, and may also have their own additional configuration
(once there is any for the web special remote) and cost.
Sponsored-by: Dartmouth College's DANDI project
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.
Debian stable does have a too old haskell-git-lfs package to be able to
build git-annex from source, but there is not currently a backport of a
recent git-annex to Debian stable. And if they update the backport at some
point, they should be able to backport the library too.
Sponsored-by: Svenne Krap on Patreon
In Makefile, listed additional deps of Build/Standalone. Without that,
it does not get updated for the change to Utility/LinuxMkLibs.hs when
compiling incrementally.
Sponsored-by: Dartmouth College's DANDI project
Speed up git-annex upgrade (from v5) and init in a repository that has
submodules. Setting the config does not affect the submodules, so avoid
the work of getting status in them, which may involve using the smudge
filter etc.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Added --anything (and --nothing). Eg, git-annex find --anything will list
all annexed files whether or not the content is present. This is slightly
faster and clearer than --include=* or --exclude=*
While I can't imagine how --nothing will be used, preferred content
expressions already had anything and nothing, so might as well support both
as matching options as well.
Sponsored-by: Dartmouth College's Datalad project
Improve handling of some .git/annex/ subdirectories being on other
filesystems, in the bittorrent special remote, and youtube-dl integration,
and git-annex addurl.
The only one of these that I've confirmed to be a problem is in the
bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are
on different filesystems.
As well as auditing for renameFile, also audited for createLink, all of
those are ok as are the other remaining renameFile calls. Also audited all
code paths that use .git/annex/othertmp, and did not find any other
cross-device problems. So, removing mention of othertmp needing to be on
the same device.
Sponsored-by: Dartmouth College's Datalad project
Change --metadata comparisons < > <= and >= to fall back to lexicographical
comparisons when one or both values being compared are not numbers.
Sponsored-by: Erik Bjäreholt on Patreon
Fix a hang that occasionally occurred during commands such as move.
(A bug introduced in 10.20220927, in
commit 6a3bd283b8)
The restage.log was kept locked while running a complex index refresh
action. In an unusual situation, that action could need to write to the
restage log, which caused a deadlock.
The solution is a two-stage process. First the restage.log is moved to a
work file, which is done with the lock held. Then the content of the work
file is read and processed, which happens without the lock being held.
This is all done in a crash-safe manner.
Note that streamRestageLog may not be fully safe to run concurrently
with itself. That's ok, because restagePointerFiles uses it with the
index lock held, so only one can be run at a time.
streamRestageLog does delete the restage.old file at the end without
locking. If a calcRestageLog is run concurrently, it will either see the
file content before it was deleted, or will see it's missing. Either is
ok, because at most this will cause calcRestageLog to report more
work remains to be done than there is.
Sponsored-by: Dartmouth College's Datalad project
Debian is going to drop youtube-dl which is not active upstream, and yt-dlp
is the replacement. This will make it be used if youtube-dl gets removed.
If an old version of youtube-dl remains installed, git-annex will still use
it. That might not be desirable, but changing git-annex to use yt-dlp in
preference to youtube-dl when both are installed risks breaking when
the user has annex.youtube-dl-options set to something that is supported
by youtube-dl, but not by yt-dlp.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
init: Avoid scanning for annexed files, which can be lengthy in a
large repository. Instead that scan is done on demand. This lets git-annex
init be run and some query commands be used in a repository without
waiting.
Note that autoinit already behaved this way, so while this will mean some
commands like git-annex get/unlock/add will do the scan the first time run,
that is not really a significant behavior change.
And, it's really better to have a consistent behavior. The reason for
the inconsistency was a strange bug discussed in
b3c4579c79. Avoiding reconcileStaged in
init will keep avoiding whatever that was.
Sponsored-by: Dartmouth College's DANDI project
Increasing the size of the queue 10x makes git-annex init 7% faster in a
repository with 86000 annexed files.
The memory use goes up, from 70876 kb to 85376 kb.
Avoids database querying overhead when the database is newly created.
In the large repository where git-annex init took 24 seconds, this sped it
up to 20.47 seconds, a speedup of around 15%.
Sponsored-by: Dartmouth College's DANDI project
This reverts commit 15dd7fe84b.
aws 0.23 is not used any longer, so read-only S3 import won't be
supported yet when building with stack.
That commit broke the build on windows, because the new version of Win32 that
was included (because the old one does not work with this lts version)
needs a version of filepath that is newer than the one bundled with the
ghc in that lts version. It is not possible to override that to a newer
filepath.
Seems that the only solution to get aws 0.23 will be to wait for a ghc that
contains filepath 1.4.100.0. No ghc yet contains it. (Backporting the Win32
fix to a point release version that does not include this bleeding edge
filepath would also resolve it, but seems unlikely to happen.)
Sponsored-by: Jarkko Kniivilä on Patreon
export: Fix a bug that left a file on a special remote when two files with
the same content were both deleted in the exported tree.
Case of the wrong data structure leading to the wrong result.
The DiffMap now contains all the old filenames, and all the new filenames.
Note that, when 2 files with the same content are both renamed,
it only renames the first, but deletes and re-exports the second.
Improving that is possible, but it would need to use a different temporary
filename. Anyway, that is an unusual case, and there are known to be other
unusual cases where export does not rename with maximum efficiency, IIRC.
(Or maybe this is the case that I remember?)
Sponsored-by: Dartmouth College's OpenNeuro project
This allows building with aws-0.23
Win32-2.13.4.0 contains a function that is not in lts-19.32 yet. Adding
it to stack.yaml does not seem to cause problems when building on linux.
aws-0.23 has been released.
When built with an older aws, initremote will error out when run
with signature=anonymous. And when a remote has been initialized with
that by a version of git-annex that does support it, older versions will
fail when the remote is accessed, with a useful error message.
Sponsored-by: Dartmouth College's DANDI project
Clean the standalone environment before running the su command
to run "sh". Otherwise, PATH leaked through, causing it to run
git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set,
which caused that script to not work:
[2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"]
/home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found
Changed programPath to not use GIT_ANNEX_PROGRAMPATH,
but instead run the scripts at the top of GIT_ANNEX_DIR.
That works both when the standalone environment is set up, and when it's
not.
Sponsored-by: Kevin Mueller on Patreon
Make --batch mode handle unstaged annexed files consistently whether the
file is unlocked or not. Before this, a unstaged locked file
would have the symlink on disk examined and operated on in --batch mode,
while an unstaged unlocked file would be skipped.
Note that, when not in batch mode, unstaged files are skipped over too.
That is actually somewhat new behavior; as late as 7.20191114 a
command like `git-annex whereis .` would operate on unstaged locked
files and skip over unstaged unlocked files. That changed during
optimisation of CmdLine.Seek with apparently little fanfare or notice.
Turns out that rmurl still behaved that way when given an unstaged file
on the command line. It was changed to use lookupKeyStaged to
handle its --batch mode. That also affected its non-batch mode, but
since that's just catching up to the change earlier made to most
other commands, I have not mentioed that in the changelog.
It may be that other uses of lookupKey should also change to
lookupKeyStaged. But it may also be that would slow down some things,
or lead to unwanted behavior changes, so I've kept the changes minimal
for now.
An example of a place where the use of lookupKey is better than
lookupKeyStaged is in Command.AddUrl, where it looks to see if the file
already exists, and adds the url to the file when so. It does not matter
there whether the file is staged or not (when it's locked). The use of
lookupKey in Command.Unused likewise seems good (and faster).
Sponsored-by: Nicholas Golder-Manning on Patreon
While ErrorBusy and other exceptions were caught and the write retried for
up to 10 seconds, it was still possible for git-annex to eventually
give up and error out without writing to the database. Now it will retry
as long as necessary.
This does mean that, if one git-annex process is suspended just as sqlite
has locked the database for writing, another git-annex that tries to write
it it might get stuck retrying forever. But, that could already happen when
opening the sqlite database, which retries forever on ErrorBusy. This is an
area where git-annex is known to not behave well, there's a todo about the
general case of it.
Sponsored-by: Dartmouth College's Datalad project
When running eg git-annex get, for each file it has to read from and
write to the keys database. But it's reading exclusively from one table,
and writing to a different table. So, it is not necessary to flush the
write to the database before reading. This avoids writing the database
once per file, instead it will buffer 1000 changes before writing.
Benchmarking getting 1000 small files from a local origin,
git-annex get now takes 13.62s, down from 22.41s!
git-annex drop now takes 9.07s, down from 18.63s!
Wowowowowowowow!
(It would perhaps have been better if there were separate databases for
the two tables. At least it would have avoided this complexity. Ah well,
this is better than splitting the table in a annex.version upgrade.)
Sponsored-by: Dartmouth College's Datalad project
When importing from versioned remotes, fix tracking of the content of
deleted files.
Only S3 supports versioning so far, so only it was affected.
But, the draft import/export interface for external remotes also seemed to
need a change, so that versionedExport could be set.
S3: Speed up importing from a large bucket when fileprefix= is set by only
asking for files under the prefix.
getBucket still returns the files with the prefix included, so the rest of
the fileprefix stripping still works unchanged.
Sponsored-by: Dartmouth College's DANDI project
This can be used, for example, with importtree=yes to import from a public
bucket.
This needs a patch that has not yet landed in the aws library, and will
need to be adjusted to support compiling with old versions of the library,
so is not yet suitable for merging.
See https://github.com/aristidb/aws/pull/281
The stack.yaml changes are provided to show how to build against the aws
fork and will need to be reverted as well.
Sponsored-by: Dartmouth College's DANDI project
move: Fix openFile crash with -J
This does make them a bit slower, although usually the log file is not
very big, so even when it's being rewritten, they will not block for
long taking the lock. Still, little slowdowns may add up when moving a lot
file files.
A less expensive fix would be to use something lower level than openFile
that does not check if the file is already open for write by another
thread. But GHC does not seem to provide anything convenient; even mkFD
checks for a writing thread.
fullLines is no longer necessary since these functions no longer will
read the file while it's being written.
Sponsored-by: Dartmouth College's DANDI project
* trust, untrust, semitrust, dead: Fix behavior when provided with
multiple repositories to operate on.
* trust, untrust, semitrust, dead: When provided with no parameters,
do not operate on a repository that has an empty name.
The man page and usage already indicated that multiple repos could be
provided to these commands, but they actually used unwords to combine
everything into string, and found a repo matching that string. This was
especially bad when no parameters resulted in the empty string and some
repo happened to have an empty description.
This does change the behavior, and it's possible someone relied on the
current behavior to eg, trust a repo by name with the name not quoted into
a single parameter. But fixing the empty string bug and matching the
documentation are worth breaking that usage.
Note that git-annex init/reinit do still unwords multiple parameters when
provided to them. That is inconsistent behavior, but it certianly seems
possible that something does run git-annex init with an unquoted
description, and I don't think it's worth breaking that just to make it more
consistent with these other commands.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
Avoids displaying warning about git-annex restage needing to be run in
situations where it does not.
Closing a handle flushes it anyway, so no need for an explict flush. The
handle does get closed twice, but that's fine, the second one does nothing.
Sponsored-by: Dartmouth College's DANDI project
Well, actually, fix a typo that has always been in the implementation of
that. "inbacked" used to work, but let's not tell users about that; they
might try to use it and expect git-annex to keep supporting the typo..
Sponsored-by: Jack Hill on Patreon
p2p: Pass wormhole the --appid option before the receive/send command, as
it does not accept that option after the command
I'm left wondering, did I get this wrong from the beginning, or did
wormhole change its option parser? I'm reminded of the change in 0.8.2
where it silently changed what FD the pairing code was output to.
But, looking at the wormhole source, it was at least putting --appid before
send in its test suite from the introduction of the option.
So I think probably this has always been broken. On 2021-12-31 the --appid
option was enabled, and it took until now for someone to try
git-annex p2p --pair and notice that flag day broke it..
Sponsored-by: Svenne Krap on Patreon
Let GIT_DIR and --git-dir override git's protection against operating in a
repository owned by another user.
This is the same behavior other git commands have.
Sponsored-by: Jarkko Kniivilä on Patreon
This is much easier and less failure-prone than having the user run
git update-index --refresh themselves.
Sponsored-by: Dartmouth College's DANDI project
Fixes updating git index file after getting an unlocked file when
annex.stalldetection is set.
The transferrer may want to send additional protocol messages when it's
shut down. Closing the read handle prevented it from doing that, and caused
it to crash rather than cleanly shutting down.
Draining the handle without processing the protocol seemed ok to do,
because anything it outputs is going to be some side message displayed
at shutdown. Displaying those once per transferrer process that is running
seems unncessary.
Sponsored-by: Dartmouth College's DANDI project
When concurrency is enabled, there can be worker threads still running
when the time limit is checked. Exiting right there does not
give those threads time to finish what they're doing. Instead, the seeking
is wrapped up, and git-annex then shuts down cleanly.
The whole point of --time-limit existing, rather than using timeout(1)
when running git-annex is to let git-annex finish the action(s) it is
working on when the time limit is reached, and shut down cleanly.
I noticed this problem when investigating why restagePointerFile might
not have run after get/drop of an unlocked file. With --time-limit -J,
a worker thread may have finished updating a work tree file, and be killed
by the time limit check before it can run restagePointerFile. So despite
--time-limit running the shutdown actions, the work tree file didn't get
restaged.
Sponsored-by: Dartmouth College's DANDI project
It's hard to know what's a good default for this. But 1 mb seems way too
small, because it's very easy for a git pull or some similar operation
that we don't think of as using much space to use up 1 mb of space.
Most people would want to free up some space if a filesystem only had 100
mb free. But on a small VPS, it's probably not uncommon to have only 1 gb
free. So 1 gb is too large for annex.diskreserve.
While old 1 gb USB keys are around, it's unlikely that anyone is
relying on them to shuttle annex data around; it would be worth anyone's
time to upgrade to a 32 gb or larger cheap modern USB key ($5).
Sponsored-by: Kevin Mueller on Patreon
Fix a reversion that prevented git-annex from working in a repository when
--git-dir or GIT_DIR is specified to relocate the git directory to
somewhere else. (Introduced in version 10.20220525)
checkRepoConfigInaccessible could still run git config --list, just passing
--git-dir. It seems not necessary, because I know that passing --git-dir
bypasses git's check for repo ownership. I suppose it might be that git
eventually changes to check something about the ownership of the working
tree, so passing --git-dir without --work-tree would still be worth doing.
But for now this is the simple fix.
Sponsored-by: Nicholas Golder-Manning on Patreon
Combined with commit 0ffc59d341, this
fixes the case where there are duplicate files on the special remote,
and the first gets modified/deleted, while the second is still present.
directory, adb: Fixed a bug when importtree=yes, and multiple files in the
special remote have the same content, that caused it to refuse to get a
file from the special remote, incorrectly complaining that it had changed,
due to only accepting the inode+mtime of one file (that was since modified
or deleted) and not accepting the inode+mtime of other duplicate files.
Sponsored-by: Max Thoursie on Patreon
Improve handling of directory special remotes with importtree=yes whose
ignoreinode setting has been changed. (By either enableremote or by
upgrading to commit 3e2f1f73cbc5fc10475745b3c3133267bd1850a7.)
When getting a file from such a remote, accept the content that would have
been accepted with the previous ignoreinode setting.
After a change to ignoreinode, importing a tree from the remote will
re-import and generate new content identifiers using the new config. So
when ignoreinode has changed to no, the inodes will be learned, and after
that point, a change in an inode will be detected as a change. Before
re-importing, a change in an inode will be ignored, as it was before the
ignoreinode change. This seems acceptble, because the user can re-import
immediately if they urgently need to add inodes. And if not, they'll
do it sometime, presumably, and the change will take effect then.
Sponsored-by: Erik Bjäreholt on Patreon
Fix a reversion that made dead keys not be skipped when operating on all
keys via --all or in a bare repo. (Introduced in version 8.20200720)
Also, improved the documentation of git-annex-dead, it does not only apply
to fsck --all.
Also, made git-annex fsck, when run on a file whose key is dead, display
that. Before, it displayed that only when run with --all, but with this
fix, it skips dead keys with --all. But it can still be run on a file that
uses a dead key, and displaying "This key is dead" explains to the user
why it does not consider missing content for it to be a problem.
Sponsored-by: k0ld on Patreon
Use curl for downloads from git remotes when annex.url-options and other
git configs are set.
If the url needs a password, curl will fail, and git credential will not be
used to prompt for it. But the user can set --netrc in url-options and
put the password in the netrc file.
This also means that url-options settings like -4 will take effect.
That was the case before commit 1883f7ef8f
forced conduit to be used.
When accessing a git remote over http needs a git credential prompt for a
password, cache it for the lifetime of the git-annex process, rather than
repeatedly prompting.
The git-lfs special remote already caches the credential when discovering
the endpoint. And presumably commands like git pull do as well, since they
may download multiple urls from a remote.
The TMVar CredentialCache is read, so two concurrent calls to
getBasicAuthFromCredential will both prompt for a credential.
There would already be two concurrent password prompts in such a case,
and existing uses of `prompt` probably avoid it. Anyway, it's no worse
than before.
Fix crash importing from a directory special remote that contains a broken
symlink.
The crash was in listImportableContentsM but some other places in
Remote.Directory also seemed like they could have the same problem.
Also audited for other places that have such a problem. Not all calls
to getFileStatus are bad, in some cases it's better to crash on something
unexpected. For example, `git-annex import path` when the path is a broken
symlink should crash, the same as when it does not exist. Many of the
getFileStatus calls are like that, particularly when they involve
.git/annex/objects which should never have a broken symlink in it.
Fixed a few other possible cases of the problem.
Sponsored-by: Lawrence Brogan on Patreon
Trick the linker into not doing unncessary work searching for optimised
libraries that are not present, by symlinking the directories where
optimised libs would be to the main lib dir.
This reduces the ENOENT of git-annex init by about 1/2. The linker always
finds the files where it looks first time now. I have not looked at what
the wall clock speedup might be, it's probably rather small.
If a x86-64-v5 comes to be, the list will need to be extended. And there
may be other directories used on some machines that I have missed. Not done
for arm64 yet, or any uncommon architectures.
Sponsored-by: Dartmouth College's Datalad project
For some reason, cabal 3.4.1.0 builds w/o the assistant and webapp,
even when the flag is explicitly turned on. Moving the build-depends from
inside the if flag section to the main build-depends somehow fixes this.
Since the webapp build deps are thus always available, there is no reason
not to build the webapp when building the assistant. So, got rid of the
webapp build flag. Kept the assistant build flag for now, since building
without it does at least still speed up the build.
Sponsored-by: Brock Spratlen on Patreon
Too big a footgun.
This does not prevent attackers who can write to the directory being
imported from racing the check. But they can cause anything to be imported
anyway, so would be limited to getting the legacy import to follow into a
directory they do not write to, and move files out of it into the annex.
(The directory special remote does not have that problem since it does not
move files.)
Sponsored-by: Jack Hill on Patreon
Fix a regression in 10.20220624 that caused git-annex add to crash when
there was an unstaged deletion.
Sponsored-by: Dartmouth College's Datalad project
Use curl when annex.security.allowed-url-schemes includes an url scheme not
supported by git-annex internally, as long as
annex.security.allowed-ip-addresses is configured to allow using curl.
Sponsored-by: Luke Shumaker on Patreon
WIP: This is mostly complete, but there is a problem: createDirectoryUnder
throws an error when annex.dbdir is set to outside the git repo.
annex.dbdir is a workaround for filesystems where sqlite does not work,
due to eg, the filesystem not properly supporting locking.
It's intended to be set before initializing the repository. Changing it
in an existing repository can be done, but would be the same as making a
new repository and moving all the annexed objects into it. While the
databases get recreated from the git-annex branch in that situation, any
information that is in the databases but not stored in the branch gets
lost. It may be that no information ever gets stored in the databases
that cannot be reconstructed from the branch, but I have not verified
that.
Sponsored-by: Dartmouth College's Datalad project
Since bup split is not concurrency safe.
Used a lock file so that 2 git-annex processes only run one bup split
between them (per bup repo).
(Concurrent writes from different git-annex repository clones to the same
bup repo could still have concurrency problems.)
Sponsored-by: Noam Kremen on Patreon
It seems worth noting here that I emailed bup's author about bup split
being noisy on stderr even with -q in approximately 2011. That never got
fixed. Its current repo on github only accepts pull requests, not bug
reports. Needing to add such complexity to deal with such a longstanding
unfixed issue is not fun.
Sponsored-by: Kevin Mueller on Patreon
bup split outputs to stderr even with -q. This was discarded when using -J,
but it was still outputting when not using -J, and so was git-annex.
Sponsored-by: Nicholas Golder-Manning on Patreon
Work around bug in git 2.37 that causes a segfault when when
core.untrackedCache is set, and broke git-annex init.
Depending on when git gets fixed and how widely the buggy versions are
used, this could be reverted quite soon, or need to linger for a long time.
It only makes git-annex init a tiny bit slower in a new repo.
Sponsored-by: Max Thoursie on Patreon
This is intended for users who want to see what it would output in order to
eg, check if a file would be added to git or the annex. It is not intended
as a way for scripts to get information.
Sponsored-by: Dartmouth College's Datalad project
Last try at this broke on windows with a problem installing ghc, but I
wanted to try again.
Also this has a version of aws that allows using aeson 2.0, which has a
potential security fix.
(And v9 later on to v10.)
When v9/v10 were added, making v8 automatically upgrade was deferred
"for a few months" to prevent interoperability problems if users also
have an old version of git-annex. Of course that could still be the
case, but there has been a good amount of time and this can't be put off
forever.
Allow setting annex.autoupgraderepository to false to avoid this upgrade.
Previously, that only prevented upgrades from no longer supported git-annex
versions, but v8 is still supported, and users may want to keep on v8 to
interoperate with an old git-annex version.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
I would like for a new repo version to enable appends, but to do so
safely would need a v11 followed by a 1 year delay followed by a v12
that does it. Since a similar v9 and v10 transition is currently
happening, and is less than 6 months along in most repos, it does not
feel wise to stack up another year-long transition behind that. What if
I need to hurry up a new repo version for some other change?
Added todo so I remember to make this change at some time when a v11
and probably v12 repo version do make sense.
Sponsored-by: Dartmouth College's DANDI project
Added annex.alwayscompact setting which can be unset to speed up writes to
the git-annex branch in some cases.
Sponsored-by: Dartmouth College's DANDI project
Fix a reversion that prevented --batch commands (and the assistant)
from noticing data written to the journal by other commands.
I have not identified which commit broke this for sure,
but probably it was aeca7c2207
--batch commands that wrote to the journal avoided the problem since
journalIgnorable sets unset on write. It's a little bit surprising that
nobody noticed that query --batch commands did not see data written by
other commands.
Sponsored-by: Dartmouth College's DANDI project
It does not make sense for either; importing from an existing bucket should
not write to it. And the user may not have write access at all. And exporting to
a bucket should not write other files.
Also this prevents the uuid file being imported after being written.
Sponsored-by: Dartmouth College's DANDI project
To avoid using find -printf, which was first supported in Android around
2019-2020.
Probing seems too fragile, and execing stat once per file is too slow to do
when there's a faster way available, which brought me to an option...
Sponsored-by: Brett Eisenberg on Patreon