(And v9 later on to v10.)
When v9/v10 were added, making v8 automatically upgrade was deferred
"for a few months" to prevent interoperability problems if users also
have an old version of git-annex. Of course that could still be the
case, but there has been a good amount of time and this can't be put off
forever.
Allow setting annex.autoupgraderepository to false to avoid this upgrade.
Previously, that only prevented upgrades from no longer supported git-annex
versions, but v8 is still supported, and users may want to keep on v8 to
interoperate with an old git-annex version.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
I would like for a new repo version to enable appends, but to do so
safely would need a v11 followed by a 1 year delay followed by a v12
that does it. Since a similar v9 and v10 transition is currently
happening, and is less than 6 months along in most repos, it does not
feel wise to stack up another year-long transition behind that. What if
I need to hurry up a new repo version for some other change?
Added todo so I remember to make this change at some time when a v11
and probably v12 repo version do make sense.
Sponsored-by: Dartmouth College's DANDI project
An append that is interrupted and writes part of a line is now dealt
with by subsequent reads and appends. This also handles a read that
happens at the same time as an append to the file.
Old versions of git-annex will still see a partially written line,
and could get confused. Since appends are currently done for url logs
and location logs, the confusion is limited to a substring of the actual
url or UUID of the remote being read. This will not affect writes, since
the journal file is locked when reading in preparation for writing.
However, the bad data can be output by git-annex and used by other
things, or could cause surprising behavior by git-annex. Including eg,
downloading the content of the wrong url.
So, something needs to be done to prevent old versions of git-annex from
running in a repository where this appending is being done..
Sponsored-by: Dartmouth College's DANDI project
Added annex.alwayscompact setting which can be unset to speed up writes to
the git-annex branch in some cases.
Sponsored-by: Dartmouth College's DANDI project
Fix a reversion that prevented --batch commands (and the assistant)
from noticing data written to the journal by other commands.
I have not identified which commit broke this for sure,
but probably it was aeca7c2207
--batch commands that wrote to the journal avoided the problem since
journalIgnorable sets unset on write. It's a little bit surprising that
nobody noticed that query --batch commands did not see data written by
other commands.
Sponsored-by: Dartmouth College's DANDI project
To avoid using find -printf, which was first supported in Android around
2019-2020.
Probing seems too fragile, and execing stat once per file is too slow to do
when there's a faster way available, which brought me to an option...
Sponsored-by: Brett Eisenberg on Patreon
This caused git to complain that filter-process failed and kill it with
signal 15. Because it wrote an extra flushPkt for an empty file, which
git did not expect, and so git saw an unexpected response to the next
request.
Luckily, filter-process is only used by default in v9 and up, and v8 is
still the default. Also, git had to be updating an empty file, followed
by another file, which is a fairly unlikely situation. And git restarts
filter-process after this happens and uses it to filter the rest of the
files. So this isn't a crippling bug.
Sponsored-by: Luke Shumaker on Patreon
On Windows, that does not support long paths
https://github.com/jacobstanley/unix-compat/issues/56
Instead, use System.Directory.renamePath, which does support long paths.
Sponsored-by: Dartmouth College's Datalad project
The webapp modules cannot build with the assistant disabled, so make the
webapp be under the assistant build flag.
Sponsored-by: Jarkko Kniivilä on Patreon
--backend is no longer a global option, and is only accepted by commands
that actually need it.
Three commands that used to support backend but don't any longer are
watch, webapp, and assistant. It would be possible to make them support it,
but I doubt anyone used the option with these. And in the case of webapp
and assistant, the option was handled inconsistently, only taking affect
when the command is run with an existing git-annex repo, not when it
creates a new one.
Also, renamed GlobalOption etc to AnnexOption. Because there are many
options of this type that are not actually global (any more) and get
added to commands that need them.
Sponsored-by: Kevin Mueller on Patreon
Improve handling of parallelization with -J when copying content from/to a
git remote that is a local path.
Sponsored-by: Nicholas Golder-Manning on Patreon
At this point I've checked all AnnexState values and these were all that
remained that could move.
Pity that Annex.repo can't move, but it gets modified sometimes..
A couple of AnnexState values are set by options and could be AnnexRead,
but happen to use Annex when being set.
Sponsored-by: Max Thoursie on Patreon
When adding a small file, it does not get locked down, so can be modified
after git-annex checks that it's small. The use of queued git add made the
race window nice and wide too.
Fixed by checking if the file has changed, and by not using git add.
Instead, have to recapitulate git add's handling of things like symlinks
and executable files.
Sponsored-by: Jochen Bartl on Patreon
The remaining callers all did not rely on it checking gitignore, so were
easy to convert.
They were susceptable to the same overwrite race as add and fix,
although less likely to have it and a narrower window than add's race.
Command.Rekey in passing got an unncessary call to removeFile deleted.
addSymlink handles deleting any existing worktree file.
This is not a complete fix for all such races, only the one where a
large file gets changed while adding and gets added to git rather than
to the annex.
addLink needs to go away, any caller of it is probably subject to the
same kind of race. (Also, addLink itself fails to check gitignore when
symlinks are not supported.)
ingestAdd no longer checks gitignore. (It didn't check it consistently
before either, since there were cases where it did not run git add!)
When git-annex import calls it, it's already checked gitignore itself
earlier. When git-annex add calls it, it's usually on files found
by withFilesNotInGit, which handles checking ignores.
There was one other case, when git-annex add --batch calls it. In that
case, old git-annex behaved rather badly, it would seem to add the file,
but git add would later fail, leaving the file as an unstaged annex symlink.
That behavior has also been fixed.
Sponsored-by: Brett Eisenberg on Patreon
move: Improve resuming a move that succeeded in transferring the content,
but where dropping failed due to eg a network problem, in cases where
numcopies checks prevented the resumed move from dropping the object from
the source repository.
This was earlier done for moves that got interrupted during the drop stage.
Sponsored-by: Svenne Krap on Patreon
Fix retrival of an empty file that is stored in a special remote with
chunking enabled.
The speculative chunk stuff caused a reversion by adding an empty list for
the empty file. Which is just wrong; the empty file is still stored on the
remote, and should be retrieved like any other file. It uses 1 chunk, so
`max 1` is the simple fix.
Sponsored-by: Noam Kremen on Patreon
rather than matching path of an existing remote to find the uuid.
The main benefit of this is that locations not using ssh:// will work
now, including both paths and host:/path
The other benefit is that it's a simpler interface, no need to have an
existing remote with the same url and some other name. Although that
will still work of course.
This does rely on tryGitConfigRead working when given a Git.Repo that is
not a remote. Luckily, it works fine that way.
Also, tryGitConfigRead will auto-init a local repo that has a git-annex
branch. I did not enable auto-init of ssh repos though.
The uuid discovery actually happens twice; initremote discovers it,
and uses it to store the special remote config, but does not set it in the
git remote it creates. So the next run of git-annex does uuid discovery
again, and caches it that time. This could be improved for a tiny
speedup, but I didn't want to complicate things for that in this
commit.
Sponsored-by: Dartmouth College's DANDI project
to track down a blocked indefinitely on MVar that seems to occur after
sqlite throws ErrorBusy but that I have not been able to reproduce when
I made commits synthetically throw ErrorBusy.
Sponsored-by: Dartmouth College's Datalad project
Use cases include using git-annex init --no-autoenable and then going back
and enabling the special remotes that have autoenable configured. As well
as just querying to remember which ones have it enabled.
It lists all special remotes that have autoenable=yes whether currently
enabled or not. And it can be used with --json.
I pondered making this "git-annex info autoenable", but that seemed wrong
because then if the use has a directory named "autoenable", it's unclear
what they are asking for. (Although "git-annex info remote" may be
similarly unclear.) Making it an option does mean that it can't be provided
via --batch though.
Sponsored-by: Dartmouth College's Datalad project
Someone may disagree with what repositories are set to autoenable and
it's good to have local overrides.
See https://github.com/datalad/datalad/issues/6634
Sponsored-by: Dartmouth College's Datalad project
test: When limiting tests to run with -p, work around tasty limitation by
automatically including dependent tests.
This fixes a reversion because it didn't used to use dependencies and
forced tasty to run the init tests first. That changed when parallelizing
the test suite.
It will sometimes do a little more work than strictly required,
because it adds init tests deps when limited to eg quickcheck tests,
which don't depend on them. But this only adds a few seconds work.
Sponsored-by: Dartmouth College's Datalad project
This reverts windows-specific parts of 5a98f2d509
There were no code paths in common between windows and unix, so this
will return Windows to the old behavior.
The problem that the commit talks about has to do with multiple different
locations where git-annex can store annex object files, but that is not
too relevant to Windows anyway, because on windows the filesystem is always
treated as criplled and/or symlinks are not supported, so it will only
use one object location. It would need to be using a repo populated
in another OS to have the other object location in use probably.
Then a drop and get could possibly lead to a dangling lock file.
And, I was not able to actually reproduce that situation happening
before making that commit, even when I forced a race. So making these
changes on windows was just begging trouble..
I suspect that the change that caused the reversion is in
Annex/Content/Presence.hs. It checks if the content file exists,
and then called modifyContentDirWhenExists, which seems like it would
not fail, but if something deleted the content file at that point,
that call would fail. Which would result in an exception being thrown,
which should not normally happen from a call to inAnnexSafe. That was a
windows-specific change; the unix side did not have an equivilant
change.
Sponsored-by: Dartmouth College's Datalad project
setEnv is not thread safe and could cause a getEnv by another thread to
segfault, or perhaps other had behavior. This is particularly a problem
when using tasty, because tasty runs the test in a thread, and a getEnv
in another thread.
The use of top-level TMVars is ugly, but ok because only 1 test actually
runs at a time per process. Because it has to chdir into the test repo.
The setEnv that remains happens before tasty is running.
Sponsored-by: Dartmouth College's Datalad project
setEnv is not thread safe and could cause a getEnv by another thread to
segfault, or perhaps other had behavior.
Sponsored-by: Dartmouth College's Datalad project
The purpose of this is to fix situations where the annex object file is
stored in a directory structure other than where annex symlinks point to.
But it will also move object files from the hashdirmixed back to
hashdirlower if the repo configuration makes that the normal location.
It would have been more work to avoid that than to let it do it.
Sponsored-by: Dartmouth College's Datalad project
If the content directory does not exist, then it does not make sense to
lock the content file, as it also does not exist, and so it's ok for the
lock operation to fail.
This avoids potential races where the content file exists but is then
deleted/renamed, while another process sees that it exists and goes to
lock it, resulting in a dangling lock file in an otherwise empty object
directory.
Also renamed modifyContent to modifyContentDir since it is not only
necessarily used for modifying content files, but also other files in
the content directory.
Sponsored-by: Dartmouth College's Datalad project
Added support for "megabit" and related bandwidth units in
annex.stalldetection and everywhere else that git-annex parses data units.
Note that the short form is "Mbit" not "Mb" because that differs from "MB"
only in case, and git-annex parses units case-insensitively. It would be
horrible if two different versions of git-annex parsed the same value
differently, so I don't think "Mb" can be supported.
See comment for bonus sad story from my childhood.
Sponsored-by: Nicholas Golder-Manning
Avoid treating refs/annex/last-index or other refs that are not commit
objects as evidence of repository corruption.
The repair code checks to find bad refs by trying to run `git log` on
them, and assumes that no output means something is broken. But git log
on a tree object is empty.
This was worth fixing generally, not as a special case, since it's certainly
possible that other things store tree or other objects in refs.
Sponsored-by: Max Thoursie on Patreon
rsync 3.2.4 broke backwards-compatability by preventing exposing filenames
to the shell. Made the rsync and gcrypt special remotes detect this and
disable shellescape.
An alternative fix would have been to always set RSYNC_OLD_ARGS=1.
Which would avoid the overhead of probing rsync --help for each affected
remote. But that is really very fast to run, and it seemed better to switch
to the modern code path rather than keeping on using the bad old code path.
Sponsored-by: Tobias Ammann on Patreon
This avoids displaying the unexpected exit codes message when
the list is eg [ExitSuccess, ExitFailure 1].
Sponsored-by: Dartmouth College's Datalad project
If the temp directory can somehow contain a hard link, it changes the
mode, which affects all other hard linked files. So, it's too unsafe
to use everywhere in git-annex, since hard links are possible in
multiple ways and it would be very hard to prove that every place that
uses a temp directory cannot possibly put a hard link in it.
Added a call to removeDirectoryForCleanup to test_crypto, which will
fix the problem that commit 17b20a2450
was intending to fix, with a much smaller hammer.
Sponsored-by: Dartmouth College's Datalad project
Using removePathForcibly avoids concurrent removal problems.
The i386ancient build still uses an old version of ghc and directory that
do not include removePathForcibly though.
Sponsored-by: Dartmouth College's Datalad project
Tagging todos that seem to have a plan ready as confirmed.
Also closed some old ones for various reasons. Including several that
turn out to be addressed by newer features.
Also opened a new todo about git-annex-config needing a criteria to add
new configs to it.
assistant: When annex.autocommit is set, notice commits that the user makes
manually, and push them out to remotes promptly.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
Ignore annex.numcopies set to 0 in gitattributes or git config, or by
git-annex numcopies or by --numcopies, since that configuration would make
git-annex easily lose data. Same for mincopies.
This is a continuation of the work to make data only be able to be lost
when --force is used. It earlier led to the --trust option being disabled,
and similar reasoning applies here.
Most numcopies configs had docs that strongly discouraged setting it to 0
anyway. And I can't imagine a use case for setting to 0. Not that there
might not be one, but it's just so far from the intended use case of
git-annex, of managing and storing your data, that it does not seem like
it makes sense to cater to such a hypothetical use case, where any
git-annex drop can lose your data at any time.
Using a smart constructor makes sure every place avoids 0. Note that this
does mean that NumCopies is for the configured desired values, and not the
actual existing number of copies, which of course can be 0. The name
configuredNumCopies is used to make that clear.
Sponsored-by: Brock Spratlen on Patreon
prop_relPathDirToFileAbs_basics (TestableFilePath ":/") failed on
windows. The colon was filtered out after trying to make
the path relative, which only removed leading path separators.
So, ":/" changed to "/" which is not relative. Filtering out the colon
before hand avoids this problem.
Sponsored-by: Luke Shumaker on Patreon
add: Avoid unncessarily converting a newly unlocked file to be stored
in git when it is not modified, even when annex.largefiles does not
match it.
This fixes a reversion in version 10.20220222, where git-annex unlock
followed by git-annex add, followed by git commit file could result in
git thinking the file was modified after the commit.
I do have half a mind to remove the withUnmodifiedUnlockedPointers part
of git-annex add. It seems weird, despite that old bug report arguing
a case of consistency that it ought to behave that way. When git-annex
add surpises me, it seems likely it's wrong.. But for now, this is the
smallest possible fix.
Sponsored-by: Dartmouth College's Datalad project