This is much easier and less failure-prone than having the user run
git update-index --refresh themselves.
Sponsored-by: Dartmouth College's DANDI project
Fixes updating git index file after getting an unlocked file when
annex.stalldetection is set.
The transferrer may want to send additional protocol messages when it's
shut down. Closing the read handle prevented it from doing that, and caused
it to crash rather than cleanly shutting down.
Draining the handle without processing the protocol seemed ok to do,
because anything it outputs is going to be some side message displayed
at shutdown. Displaying those once per transferrer process that is running
seems unncessary.
Sponsored-by: Dartmouth College's DANDI project
When concurrency is enabled, there can be worker threads still running
when the time limit is checked. Exiting right there does not
give those threads time to finish what they're doing. Instead, the seeking
is wrapped up, and git-annex then shuts down cleanly.
The whole point of --time-limit existing, rather than using timeout(1)
when running git-annex is to let git-annex finish the action(s) it is
working on when the time limit is reached, and shut down cleanly.
I noticed this problem when investigating why restagePointerFile might
not have run after get/drop of an unlocked file. With --time-limit -J,
a worker thread may have finished updating a work tree file, and be killed
by the time limit check before it can run restagePointerFile. So despite
--time-limit running the shutdown actions, the work tree file didn't get
restaged.
Sponsored-by: Dartmouth College's DANDI project
It's hard to know what's a good default for this. But 1 mb seems way too
small, because it's very easy for a git pull or some similar operation
that we don't think of as using much space to use up 1 mb of space.
Most people would want to free up some space if a filesystem only had 100
mb free. But on a small VPS, it's probably not uncommon to have only 1 gb
free. So 1 gb is too large for annex.diskreserve.
While old 1 gb USB keys are around, it's unlikely that anyone is
relying on them to shuttle annex data around; it would be worth anyone's
time to upgrade to a 32 gb or larger cheap modern USB key ($5).
Sponsored-by: Kevin Mueller on Patreon
Fix a reversion that prevented git-annex from working in a repository when
--git-dir or GIT_DIR is specified to relocate the git directory to
somewhere else. (Introduced in version 10.20220525)
checkRepoConfigInaccessible could still run git config --list, just passing
--git-dir. It seems not necessary, because I know that passing --git-dir
bypasses git's check for repo ownership. I suppose it might be that git
eventually changes to check something about the ownership of the working
tree, so passing --git-dir without --work-tree would still be worth doing.
But for now this is the simple fix.
Sponsored-by: Nicholas Golder-Manning on Patreon
Combined with commit 0ffc59d341, this
fixes the case where there are duplicate files on the special remote,
and the first gets modified/deleted, while the second is still present.
directory, adb: Fixed a bug when importtree=yes, and multiple files in the
special remote have the same content, that caused it to refuse to get a
file from the special remote, incorrectly complaining that it had changed,
due to only accepting the inode+mtime of one file (that was since modified
or deleted) and not accepting the inode+mtime of other duplicate files.
Sponsored-by: Max Thoursie on Patreon
Improve handling of directory special remotes with importtree=yes whose
ignoreinode setting has been changed. (By either enableremote or by
upgrading to commit 3e2f1f73cbc5fc10475745b3c3133267bd1850a7.)
When getting a file from such a remote, accept the content that would have
been accepted with the previous ignoreinode setting.
After a change to ignoreinode, importing a tree from the remote will
re-import and generate new content identifiers using the new config. So
when ignoreinode has changed to no, the inodes will be learned, and after
that point, a change in an inode will be detected as a change. Before
re-importing, a change in an inode will be ignored, as it was before the
ignoreinode change. This seems acceptble, because the user can re-import
immediately if they urgently need to add inodes. And if not, they'll
do it sometime, presumably, and the change will take effect then.
Sponsored-by: Erik Bjäreholt on Patreon
Fix a reversion that made dead keys not be skipped when operating on all
keys via --all or in a bare repo. (Introduced in version 8.20200720)
Also, improved the documentation of git-annex-dead, it does not only apply
to fsck --all.
Also, made git-annex fsck, when run on a file whose key is dead, display
that. Before, it displayed that only when run with --all, but with this
fix, it skips dead keys with --all. But it can still be run on a file that
uses a dead key, and displaying "This key is dead" explains to the user
why it does not consider missing content for it to be a problem.
Sponsored-by: k0ld on Patreon
Use curl for downloads from git remotes when annex.url-options and other
git configs are set.
If the url needs a password, curl will fail, and git credential will not be
used to prompt for it. But the user can set --netrc in url-options and
put the password in the netrc file.
This also means that url-options settings like -4 will take effect.
That was the case before commit 1883f7ef8f
forced conduit to be used.
When accessing a git remote over http needs a git credential prompt for a
password, cache it for the lifetime of the git-annex process, rather than
repeatedly prompting.
The git-lfs special remote already caches the credential when discovering
the endpoint. And presumably commands like git pull do as well, since they
may download multiple urls from a remote.
The TMVar CredentialCache is read, so two concurrent calls to
getBasicAuthFromCredential will both prompt for a credential.
There would already be two concurrent password prompts in such a case,
and existing uses of `prompt` probably avoid it. Anyway, it's no worse
than before.
Fix crash importing from a directory special remote that contains a broken
symlink.
The crash was in listImportableContentsM but some other places in
Remote.Directory also seemed like they could have the same problem.
Also audited for other places that have such a problem. Not all calls
to getFileStatus are bad, in some cases it's better to crash on something
unexpected. For example, `git-annex import path` when the path is a broken
symlink should crash, the same as when it does not exist. Many of the
getFileStatus calls are like that, particularly when they involve
.git/annex/objects which should never have a broken symlink in it.
Fixed a few other possible cases of the problem.
Sponsored-by: Lawrence Brogan on Patreon
Trick the linker into not doing unncessary work searching for optimised
libraries that are not present, by symlinking the directories where
optimised libs would be to the main lib dir.
This reduces the ENOENT of git-annex init by about 1/2. The linker always
finds the files where it looks first time now. I have not looked at what
the wall clock speedup might be, it's probably rather small.
If a x86-64-v5 comes to be, the list will need to be extended. And there
may be other directories used on some machines that I have missed. Not done
for arm64 yet, or any uncommon architectures.
Sponsored-by: Dartmouth College's Datalad project
For some reason, cabal 3.4.1.0 builds w/o the assistant and webapp,
even when the flag is explicitly turned on. Moving the build-depends from
inside the if flag section to the main build-depends somehow fixes this.
Since the webapp build deps are thus always available, there is no reason
not to build the webapp when building the assistant. So, got rid of the
webapp build flag. Kept the assistant build flag for now, since building
without it does at least still speed up the build.
Sponsored-by: Brock Spratlen on Patreon
Too big a footgun.
This does not prevent attackers who can write to the directory being
imported from racing the check. But they can cause anything to be imported
anyway, so would be limited to getting the legacy import to follow into a
directory they do not write to, and move files out of it into the annex.
(The directory special remote does not have that problem since it does not
move files.)
Sponsored-by: Jack Hill on Patreon
Fix a regression in 10.20220624 that caused git-annex add to crash when
there was an unstaged deletion.
Sponsored-by: Dartmouth College's Datalad project
Use curl when annex.security.allowed-url-schemes includes an url scheme not
supported by git-annex internally, as long as
annex.security.allowed-ip-addresses is configured to allow using curl.
Sponsored-by: Luke Shumaker on Patreon
WIP: This is mostly complete, but there is a problem: createDirectoryUnder
throws an error when annex.dbdir is set to outside the git repo.
annex.dbdir is a workaround for filesystems where sqlite does not work,
due to eg, the filesystem not properly supporting locking.
It's intended to be set before initializing the repository. Changing it
in an existing repository can be done, but would be the same as making a
new repository and moving all the annexed objects into it. While the
databases get recreated from the git-annex branch in that situation, any
information that is in the databases but not stored in the branch gets
lost. It may be that no information ever gets stored in the databases
that cannot be reconstructed from the branch, but I have not verified
that.
Sponsored-by: Dartmouth College's Datalad project
Since bup split is not concurrency safe.
Used a lock file so that 2 git-annex processes only run one bup split
between them (per bup repo).
(Concurrent writes from different git-annex repository clones to the same
bup repo could still have concurrency problems.)
Sponsored-by: Noam Kremen on Patreon
It seems worth noting here that I emailed bup's author about bup split
being noisy on stderr even with -q in approximately 2011. That never got
fixed. Its current repo on github only accepts pull requests, not bug
reports. Needing to add such complexity to deal with such a longstanding
unfixed issue is not fun.
Sponsored-by: Kevin Mueller on Patreon
bup split outputs to stderr even with -q. This was discarded when using -J,
but it was still outputting when not using -J, and so was git-annex.
Sponsored-by: Nicholas Golder-Manning on Patreon
Work around bug in git 2.37 that causes a segfault when when
core.untrackedCache is set, and broke git-annex init.
Depending on when git gets fixed and how widely the buggy versions are
used, this could be reverted quite soon, or need to linger for a long time.
It only makes git-annex init a tiny bit slower in a new repo.
Sponsored-by: Max Thoursie on Patreon
This is intended for users who want to see what it would output in order to
eg, check if a file would be added to git or the annex. It is not intended
as a way for scripts to get information.
Sponsored-by: Dartmouth College's Datalad project
Last try at this broke on windows with a problem installing ghc, but I
wanted to try again.
Also this has a version of aws that allows using aeson 2.0, which has a
potential security fix.
(And v9 later on to v10.)
When v9/v10 were added, making v8 automatically upgrade was deferred
"for a few months" to prevent interoperability problems if users also
have an old version of git-annex. Of course that could still be the
case, but there has been a good amount of time and this can't be put off
forever.
Allow setting annex.autoupgraderepository to false to avoid this upgrade.
Previously, that only prevented upgrades from no longer supported git-annex
versions, but v8 is still supported, and users may want to keep on v8 to
interoperate with an old git-annex version.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
I would like for a new repo version to enable appends, but to do so
safely would need a v11 followed by a 1 year delay followed by a v12
that does it. Since a similar v9 and v10 transition is currently
happening, and is less than 6 months along in most repos, it does not
feel wise to stack up another year-long transition behind that. What if
I need to hurry up a new repo version for some other change?
Added todo so I remember to make this change at some time when a v11
and probably v12 repo version do make sense.
Sponsored-by: Dartmouth College's DANDI project
Added annex.alwayscompact setting which can be unset to speed up writes to
the git-annex branch in some cases.
Sponsored-by: Dartmouth College's DANDI project
Fix a reversion that prevented --batch commands (and the assistant)
from noticing data written to the journal by other commands.
I have not identified which commit broke this for sure,
but probably it was aeca7c2207
--batch commands that wrote to the journal avoided the problem since
journalIgnorable sets unset on write. It's a little bit surprising that
nobody noticed that query --batch commands did not see data written by
other commands.
Sponsored-by: Dartmouth College's DANDI project
It does not make sense for either; importing from an existing bucket should
not write to it. And the user may not have write access at all. And exporting to
a bucket should not write other files.
Also this prevents the uuid file being imported after being written.
Sponsored-by: Dartmouth College's DANDI project
To avoid using find -printf, which was first supported in Android around
2019-2020.
Probing seems too fragile, and execing stat once per file is too slow to do
when there's a faster way available, which brought me to an option...
Sponsored-by: Brett Eisenberg on Patreon
This caused git to complain that filter-process failed and kill it with
signal 15. Because it wrote an extra flushPkt for an empty file, which
git did not expect, and so git saw an unexpected response to the next
request.
Luckily, filter-process is only used by default in v9 and up, and v8 is
still the default. Also, git had to be updating an empty file, followed
by another file, which is a fairly unlikely situation. And git restarts
filter-process after this happens and uses it to filter the rest of the
files. So this isn't a crippling bug.
Sponsored-by: Luke Shumaker on Patreon
This reverts commit d2bc268317.
That seemed to break building on windows, before it starts building
git-annex at all, it tried to install ghc and something blew up:
Processing archive: C:\Users\runneradmin\AppData\Local\Programs\stack\x86_64-windows\ghc-9.0.2.tar.xz
Extracting ghc-9.0.2.tar
...
Extracted total of 11790 files from ghc-9.0.2.tar
C:\Users\runneradmin\AppData\Local\Programs\stack\x86_64-windows\ghc-9.0.2-tmp-6d0fbe7f3b29e56c\ghc-9.0.2\: renameDirectory:pathIsDirectory:CreateFile "\\\\?\\C:\\Users\\runneradmin\\AppData\\Local\\Programs\\stack\\x86_64-windows\\ghc-9.0.2-tmp-6d0fbe7f3b29e56c\\ghc-9.0.2\\": does not exist (The system cannot find the file specified.)
Hopefully a newer ghc version or updated stackage version will fix this
at some point, in the meantime revert it.
The webapp modules cannot build with the assistant disabled, so make the
webapp be under the assistant build flag.
Sponsored-by: Jarkko Kniivilä on Patreon
--backend is no longer a global option, and is only accepted by commands
that actually need it.
Three commands that used to support backend but don't any longer are
watch, webapp, and assistant. It would be possible to make them support it,
but I doubt anyone used the option with these. And in the case of webapp
and assistant, the option was handled inconsistently, only taking affect
when the command is run with an existing git-annex repo, not when it
creates a new one.
Also, renamed GlobalOption etc to AnnexOption. Because there are many
options of this type that are not actually global (any more) and get
added to commands that need them.
Sponsored-by: Kevin Mueller on Patreon
Improve handling of parallelization with -J when copying content from/to a
git remote that is a local path.
Sponsored-by: Nicholas Golder-Manning on Patreon
When adding a small file, it does not get locked down, so can be modified
after git-annex checks that it's small. The use of queued git add made the
race window nice and wide too.
Fixed by checking if the file has changed, and by not using git add.
Instead, have to recapitulate git add's handling of things like symlinks
and executable files.
Sponsored-by: Jochen Bartl on Patreon
The remaining callers all did not rely on it checking gitignore, so were
easy to convert.
They were susceptable to the same overwrite race as add and fix,
although less likely to have it and a narrower window than add's race.
Command.Rekey in passing got an unncessary call to removeFile deleted.
addSymlink handles deleting any existing worktree file.
Similar to git-annex add, git-annex fix queued git add, so if a file
got modified before git add ran, the wrong content would be staged,
perhaps a large file content.
Sponsored-by: Brock Spratlen on Patreon
This is not a complete fix for all such races, only the one where a
large file gets changed while adding and gets added to git rather than
to the annex.
addLink needs to go away, any caller of it is probably subject to the
same kind of race. (Also, addLink itself fails to check gitignore when
symlinks are not supported.)
ingestAdd no longer checks gitignore. (It didn't check it consistently
before either, since there were cases where it did not run git add!)
When git-annex import calls it, it's already checked gitignore itself
earlier. When git-annex add calls it, it's usually on files found
by withFilesNotInGit, which handles checking ignores.
There was one other case, when git-annex add --batch calls it. In that
case, old git-annex behaved rather badly, it would seem to add the file,
but git add would later fail, leaving the file as an unstaged annex symlink.
That behavior has also been fixed.
Sponsored-by: Brett Eisenberg on Patreon
move: Improve resuming a move that succeeded in transferring the content,
but where dropping failed due to eg a network problem, in cases where
numcopies checks prevented the resumed move from dropping the object from
the source repository.
This was earlier done for moves that got interrupted during the drop stage.
Sponsored-by: Svenne Krap on Patreon
Fix retrival of an empty file that is stored in a special remote with
chunking enabled.
The speculative chunk stuff caused a reversion by adding an empty list for
the empty file. Which is just wrong; the empty file is still stored on the
remote, and should be retrieved like any other file. It uses 1 chunk, so
`max 1` is the simple fix.
Sponsored-by: Noam Kremen on Patreon
rather than matching path of an existing remote to find the uuid.
The main benefit of this is that locations not using ssh:// will work
now, including both paths and host:/path
The other benefit is that it's a simpler interface, no need to have an
existing remote with the same url and some other name. Although that
will still work of course.
This does rely on tryGitConfigRead working when given a Git.Repo that is
not a remote. Luckily, it works fine that way.
Also, tryGitConfigRead will auto-init a local repo that has a git-annex
branch. I did not enable auto-init of ssh repos though.
The uuid discovery actually happens twice; initremote discovers it,
and uses it to store the special remote config, but does not set it in the
git remote it creates. So the next run of git-annex does uuid discovery
again, and caches it that time. This could be improved for a tiny
speedup, but I didn't want to complicate things for that in this
commit.
Sponsored-by: Dartmouth College's DANDI project
Use cases include using git-annex init --no-autoenable and then going back
and enabling the special remotes that have autoenable configured. As well
as just querying to remember which ones have it enabled.
It lists all special remotes that have autoenable=yes whether currently
enabled or not. And it can be used with --json.
I pondered making this "git-annex info autoenable", but that seemed wrong
because then if the use has a directory named "autoenable", it's unclear
what they are asking for. (Although "git-annex info remote" may be
similarly unclear.) Making it an option does mean that it can't be provided
via --batch though.
Sponsored-by: Dartmouth College's Datalad project
Someone may disagree with what repositories are set to autoenable and
it's good to have local overrides.
See https://github.com/datalad/datalad/issues/6634
Sponsored-by: Dartmouth College's Datalad project
test: When limiting tests to run with -p, work around tasty limitation by
automatically including dependent tests.
This fixes a reversion because it didn't used to use dependencies and
forced tasty to run the init tests first. That changed when parallelizing
the test suite.
It will sometimes do a little more work than strictly required,
because it adds init tests deps when limited to eg quickcheck tests,
which don't depend on them. But this only adds a few seconds work.
Sponsored-by: Dartmouth College's Datalad project
Deal with git's recent changes to fix CVE-2022-24765, which prevent using
git in a repository owned by someone else.
That makes git config --list not list the repo's configs, only global
configs. So annex.uuid and annex.version are not visible to git-annex.
It displayed a message about that, which is not right for this situation.
Detect the situation and display a better message, similar to the one other
git commands display.
Also, git-annex init when run in that situation would overwrite annex.uuid
with a new one, since it couldn't see the old one. Add a check to prevent
it running too in this situation. It may be that this fix has security
implications, if a config set by the malicious user who owns the repo
causes git or git-annex to run code. I don't think any git-annex configs
get run by git-annex init. It may be that some git config of a command
does get run by one of the git commands that git-annex init runs. ("git
status" is the command that prompted the CVE-2022-24765, since
core.fsmonitor can cause it to run a command). Since I don't know how
to exploit this, I'm not treating it as a security fix for now.
Note that passing --git-dir makes git bypass the security check. git-annex
does pass --git-dir to most calls to git, which it does to avoid needing
chdir to the directory containing a git repository when accessing a remote.
So, it's possible that somewhere in git-annex it gets as far as running git
with --git-dir, and git reads some configs that are unsafe (what
CVE-2022-24765 is about). This seems unlikely, it would have to be part of
git-annex that runs in git repositories that have no (visible) annex.uuid,
and git-annex init is the only one that I can think of that then goes on to
run git, as discussed earlier. But I've not fully ruled out there being
others..
The git developers seem mostly worried about "git status" or a similar
command implicitly run by a shell prompt, not an explicit use of git in
such a repository. For example, Ævar Arnfjörð Bjarma wrote:
> * There are other bits of config that also point to executable things,
> e.g. core.editor, aliases etc, but nothing has been found yet that
> provides the "at a distance" effect that the core.fsmonitor vector
> does.
>
> I.e. a user is unlikely to go to /tmp/some-crap/here and run "git
> commit", but they (or their shell prompt) might run "git status", and
> if you have a /tmp/.git ...
Sponsored-by: Jarkko Kniivilä on Patreon
The purpose of this is to fix situations where the annex object file is
stored in a directory structure other than where annex symlinks point to.
But it will also move object files from the hashdirmixed back to
hashdirlower if the repo configuration makes that the normal location.
It would have been more work to avoid that than to let it do it.
Sponsored-by: Dartmouth College's Datalad project
Commit 36133f27c0 had a boolean flip in it,
aaargh.
Special remotes with importtree=yes or exporttree=yes are once again
treated as untrusted, since files stored in them can be deleted or modified
at any time.
Sponsored-by: Kevin Mueller on Patreon
Added support for "megabit" and related bandwidth units in
annex.stalldetection and everywhere else that git-annex parses data units.
Note that the short form is "Mbit" not "Mb" because that differs from "MB"
only in case, and git-annex parses units case-insensitively. It would be
horrible if two different versions of git-annex parsed the same value
differently, so I don't think "Mb" can be supported.
See comment for bonus sad story from my childhood.
Sponsored-by: Nicholas Golder-Manning
Avoid treating refs/annex/last-index or other refs that are not commit
objects as evidence of repository corruption.
The repair code checks to find bad refs by trying to run `git log` on
them, and assumes that no output means something is broken. But git log
on a tree object is empty.
This was worth fixing generally, not as a special case, since it's certainly
possible that other things store tree or other objects in refs.
Sponsored-by: Max Thoursie on Patreon
rsync 3.2.4 broke backwards-compatability by preventing exposing filenames
to the shell. Made the rsync and gcrypt special remotes detect this and
disable shellescape.
An alternative fix would have been to always set RSYNC_OLD_ARGS=1.
Which would avoid the overhead of probing rsync --help for each affected
remote. But that is really very fast to run, and it seemed better to switch
to the modern code path rather than keeping on using the bad old code path.
Sponsored-by: Tobias Ammann on Patreon
Using removePathForcibly avoids concurrent removal problems.
The i386ancient build still uses an old version of ghc and directory that
do not include removePathForcibly though.
Sponsored-by: Dartmouth College's Datalad project
assistant: When annex.autocommit is set, notice commits that the user makes
manually, and push them out to remotes promptly.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
Ignore annex.numcopies set to 0 in gitattributes or git config, or by
git-annex numcopies or by --numcopies, since that configuration would make
git-annex easily lose data. Same for mincopies.
This is a continuation of the work to make data only be able to be lost
when --force is used. It earlier led to the --trust option being disabled,
and similar reasoning applies here.
Most numcopies configs had docs that strongly discouraged setting it to 0
anyway. And I can't imagine a use case for setting to 0. Not that there
might not be one, but it's just so far from the intended use case of
git-annex, of managing and storing your data, that it does not seem like
it makes sense to cater to such a hypothetical use case, where any
git-annex drop can lose your data at any time.
Using a smart constructor makes sure every place avoids 0. Note that this
does mean that NumCopies is for the configured desired values, and not the
actual existing number of copies, which of course can be 0. The name
configuredNumCopies is used to make that clear.
Sponsored-by: Brock Spratlen on Patreon
Removed vendored copy of http-client-restricted, and removed the
HttpClientRestricted build flag that avoided that dependency.
http-client-restricted is in Debian stable, and the i386ancient build also
uses it, so I think this vendored copy is no longer needed.
Sponsored-by: Noam Kremen on Patreon
add: Avoid unncessarily converting a newly unlocked file to be stored
in git when it is not modified, even when annex.largefiles does not
match it.
This fixes a reversion in version 10.20220222, where git-annex unlock
followed by git-annex add, followed by git commit file could result in
git thinking the file was modified after the commit.
I do have half a mind to remove the withUnmodifiedUnlockedPointers part
of git-annex add. It seems weird, despite that old bug report arguing
a case of consistency that it ought to behave that way. When git-annex
add surpises me, it seems likely it's wrong.. But for now, this is the
smallest possible fix.
Sponsored-by: Dartmouth College's Datalad project
Directory special remotes with importtree=yes have changed to once more
take inodes into account. This will cause extra work when importing from a
directory on a FAT filesystem that changes inodes on every mount.
To avoid that extra work, set ignoreinodes=yes when initializing a new
directory special remote, or change the configuration of your existing
remote: git-annex enableremote foo ignoreinodes=yes
This will mean a one-time re-import of all contents from every directory
special remote due to the changed setting.
73df633a62 thought
it was too unlikely that there would be modifications that the inode number
was needed to notice. That was probably right; it's very unlikely that a
file will get modified and end up with the same size and mtime as before.
But, what was not considered is that a program like NextCloud might write
two files with different content so closely together that they share the
mtime. The inode is necessary to detect that situation.
Sponsored-by: Max Thoursie on Patreon
Default to the number of CPU cores, which seems about optimal
on my laptop. Using one more saves me 2 seconds actually.
Better packing of workers improves speed significantly.
In 2 tests runs, I saw segfaulting workers despite my attempt
to work around that issue. So detect when a worker does, and re-run it.
Removed installSignalHandlers again, because I was seeing an
error "lost signal due to full pipe", which I guess was somehow caused
by using it.
Sponsored-by: Dartmouth College's Datalad project
Avoid git-annex test being very slow when run from within the standalone
linux tarball or OSX app.
It may not really be necessary to add to PATH the directory where the
git-annex binary resides, but it can't hurt. Most places where the test
suite or git-annex run git-annex, they use programPath, so won't need
a modified PATH. But I'm not sure if that's always the case.
Sponsored-by: Dartmouth College's Datalad project
Propagate nonzero exit status from git ls-files when a specified file does
not exist, or a specified directory does not contain any files checked into
git.
The recent completion of the annex.skipunknown transition exposed this
bug, that has unfortunately been lurking all along.
It is also possible that git ls-files errors out for some other reason
-- perhaps a permission problem -- and this will also fix error propagation
in such situations.
Sponsored-by: Dartmouth College's Datalad project
When annex.freezecontent-command is set, and the filesystem does not
support removing write bits, avoid treating it as a crippled filesystem.
The hook may be enough to prevent writing on its own, and some filesystems
ignore attempts to remove write bits.
Sponsored-by: Dartmouth College's Datalad project
It will then proceed to add the file the same as if it were any other
file containing possibly annexable content. Usually the file is one that
was annexed before, so the new, probably corrupt content will also be added
to the annex. If the file was not annexed before, the content will be added
to git.
It's not possible for the smudge filter to throw an error here, because
git then just adds the file to git anyway.
Sponsored-by: Dartmouth College's Datalad project
This format is designed to detect accidental appends, while having some
room for future expansion.
Detect when an unlocked file whose content is not present has gotten some
other content appended to it, and avoid treating it as a pointer file, so
that appended content will not be checked into git, but will be annexed
like any other file.
Dropped the max size of a pointer file down to 32kb, it was around 80 kb,
but without any good reason and certianly there are no valid pointer files
anywhere that are larger than 8kb, because it's just been specified what it
means for a pointer file with additional data even looks like.
I assume 32kb will be good enough for anyone. ;-) Really though, it needs
to be some smallish number, because that much of a file in git gets read
into memory when eg, catting pointer files. And since we have no use cases
for the extra lines of a pointer file yet, except possibly to add
some human-visible explanation that it is a git-annex pointer file, 32k
seems as reasonable an arbitrary number as anything. Increasing it would be
possible, eg to 64k, as long as users of such jumbo pointer files didn't
mind upgrading all their git-annex installations to one that supports the
new larger size.
Sponsored-by: Dartmouth College's Datalad project
File matching options like --include will be rejected in situations where
there is no filename to match against. (Or where there is a filename but
it's not relative to the cwd, or otherwise seemed too bothersome to match
against.)
The addition of listKeys' was necessary to avoid using more memory in the
common case of "git-annex info". Adding a filterM would have caused the
list to buffer in memory and not stream. This is an ugly hack, but listKeys
had previously run Annex operations inside unafeInterleaveIO (for direct
mode). And matching against a matcher should hopefully not change any Annex
state.
This does allow for eg `git-annex info somefile --include=*.ext`
although why someone would want to do that I don't really know. But it
seems to make sense to allow it.
But, consider: `git-annex info ./somefile --include=somefile`
This does not match, so will not display info about somefile.
If the user really wants to, they can `--include=./somefile`.
Using matching options like --copies or --in=remote seems likely to be
slower than git-annex find with those options, because unlike such
commands, info does not have optimised streaming through the matcher.
Note that `git-annex info remote` is not the same as
`git-annex info --in remote`. The former shows info about all files in
the remote. The latter shows local keys that are also in that remote.
The output should make that clear, but this still seems like a point
where users could get confused.
Sponsored-by: Jochen Bartl on Patreon
Implemented by making Git.Queue have a FlushAction, which can accumulate
along with another action on files, and runs only once the other action has
run.
This lets git-annex unlock queue up git update-index actions, without
conflicting with the restagePointerFiles FlushActions.
In a repository with filter-process enabled, git-annex unlock will
often not take any more time than before, though it may when the files are
large. Either way, it should always slow down less than git-annex status
speeds up.
When filter-process is not enabled, git-annex unlock will slow down as much
as git status speeds up.
Sponsored-by: Jochen Bartl on Patreon
annex.skipunknown now defaults to false, so commands like `git annex get foo*`
will not silently skip over files/dirs that are not checked into git.
Sponsored-by: Brock Spratlen on Patreon
* registerurl, unregisterurl: Improved output when reading from stdin
to be more like other batch commands.
* registerurl, unregisterurl: Added --json and --json-error-messages options.
Note that this did change the --batch output in a way that could possibly
break something that expected the old output to never change. I think it's
acceptable to break that because there has never been a guarantee of
unchanging output format except with --batch for most commands. The old
output was just really weird too!
One possible wart is that "git-annex registerurl" with no options now
seems to just hang, since it's waiting for stdin input. Before, it said
"registerurl (stdin)" which was clearer about what's happenening. But this
is a deprecated mode anyway, --batch makes clear what's happening. If
anything, this problem would be a reason to eventually remove the support
for reading from stdin w/o --batch.
Sponsored-by: Dartmouth College's Datalad project
takeByteString can only be used at the end of a parser, not before other
input. This was a dumb enough mistake that I audited the rest of the
code base for similar mistakes. Pity that attoparsec cannot avoid it at
the type level.
Fixes git-annex forget propagation between repositories. (reversion
introduced in version 7.20190122)
Sponsored-by: Brock Spratlen on Patreon
Seems that --no-ext-diff and -c diff.external= are not enough to disable
external diff command when gitattributes textconv specifies it.
I'm pretty sure that --no-ext-diff and -c diff.external= are not both
needed, but not 100%. Something about -G may need the latter to fully
disable diffs in some cases. So kept that part as it was.
Sponsored-by: Dartmouth College's Datalad project
The "+" argument only runs the command once, so is not safe to use. Using
";" instead would have been the simplest fix, but also the slowest.
Since my phone has an xargs that supports -0, I piped find to xargs
instead. Unsure how portable this will be, perhaps some android's don't
have xargs -0 or find -printf to send null terminated output.
The business with pipefail is necessary to make a failure of find cause the
import to fail. Probably this works on all androids, but if not, it will
probably just result in a failure of find being ignored. It would be
possible to make ignorefinderror just disable setting pipefail, but then
if some android has a shell that has pipefail enabled by default, ignorefinderror
would not work, so I kept the || true approach for that.
Sponsored-by: Max Thoursie on Patreon
Reject combinations of --batch (or --batch-keys) with options like --all or
--key or with filenames.
Most commands ignored the non-batch items when batch mode was enabled.
For some reason, addurl and dropkey both processed first the specified
non-batch items, followed by entering batch mode. Changed them to also
error out, for consistency.
Sponsored-by: Dartmouth College's Datalad project