--backend is no longer a global option, and is only accepted by commands
that actually need it.
Three commands that used to support backend but don't any longer are
watch, webapp, and assistant. It would be possible to make them support it,
but I doubt anyone used the option with these. And in the case of webapp
and assistant, the option was handled inconsistently, only taking affect
when the command is run with an existing git-annex repo, not when it
creates a new one.
Also, renamed GlobalOption etc to AnnexOption. Because there are many
options of this type that are not actually global (any more) and get
added to commands that need them.
Sponsored-by: Kevin Mueller on Patreon
Improve handling of parallelization with -J when copying content from/to a
git remote that is a local path.
Sponsored-by: Nicholas Golder-Manning on Patreon
When adding a small file, it does not get locked down, so can be modified
after git-annex checks that it's small. The use of queued git add made the
race window nice and wide too.
Fixed by checking if the file has changed, and by not using git add.
Instead, have to recapitulate git add's handling of things like symlinks
and executable files.
Sponsored-by: Jochen Bartl on Patreon
The remaining callers all did not rely on it checking gitignore, so were
easy to convert.
They were susceptable to the same overwrite race as add and fix,
although less likely to have it and a narrower window than add's race.
Command.Rekey in passing got an unncessary call to removeFile deleted.
addSymlink handles deleting any existing worktree file.
This is not a complete fix for all such races, only the one where a
large file gets changed while adding and gets added to git rather than
to the annex.
addLink needs to go away, any caller of it is probably subject to the
same kind of race. (Also, addLink itself fails to check gitignore when
symlinks are not supported.)
ingestAdd no longer checks gitignore. (It didn't check it consistently
before either, since there were cases where it did not run git add!)
When git-annex import calls it, it's already checked gitignore itself
earlier. When git-annex add calls it, it's usually on files found
by withFilesNotInGit, which handles checking ignores.
There was one other case, when git-annex add --batch calls it. In that
case, old git-annex behaved rather badly, it would seem to add the file,
but git add would later fail, leaving the file as an unstaged annex symlink.
That behavior has also been fixed.
Sponsored-by: Brett Eisenberg on Patreon
move: Improve resuming a move that succeeded in transferring the content,
but where dropping failed due to eg a network problem, in cases where
numcopies checks prevented the resumed move from dropping the object from
the source repository.
This was earlier done for moves that got interrupted during the drop stage.
Sponsored-by: Svenne Krap on Patreon
Fix retrival of an empty file that is stored in a special remote with
chunking enabled.
The speculative chunk stuff caused a reversion by adding an empty list for
the empty file. Which is just wrong; the empty file is still stored on the
remote, and should be retrieved like any other file. It uses 1 chunk, so
`max 1` is the simple fix.
Sponsored-by: Noam Kremen on Patreon
rather than matching path of an existing remote to find the uuid.
The main benefit of this is that locations not using ssh:// will work
now, including both paths and host:/path
The other benefit is that it's a simpler interface, no need to have an
existing remote with the same url and some other name. Although that
will still work of course.
This does rely on tryGitConfigRead working when given a Git.Repo that is
not a remote. Luckily, it works fine that way.
Also, tryGitConfigRead will auto-init a local repo that has a git-annex
branch. I did not enable auto-init of ssh repos though.
The uuid discovery actually happens twice; initremote discovers it,
and uses it to store the special remote config, but does not set it in the
git remote it creates. So the next run of git-annex does uuid discovery
again, and caches it that time. This could be improved for a tiny
speedup, but I didn't want to complicate things for that in this
commit.
Sponsored-by: Dartmouth College's DANDI project
to track down a blocked indefinitely on MVar that seems to occur after
sqlite throws ErrorBusy but that I have not been able to reproduce when
I made commits synthetically throw ErrorBusy.
Sponsored-by: Dartmouth College's Datalad project
test: When limiting tests to run with -p, work around tasty limitation by
automatically including dependent tests.
This fixes a reversion because it didn't used to use dependencies and
forced tasty to run the init tests first. That changed when parallelizing
the test suite.
It will sometimes do a little more work than strictly required,
because it adds init tests deps when limited to eg quickcheck tests,
which don't depend on them. But this only adds a few seconds work.
Sponsored-by: Dartmouth College's Datalad project
This reverts windows-specific parts of 5a98f2d509
There were no code paths in common between windows and unix, so this
will return Windows to the old behavior.
The problem that the commit talks about has to do with multiple different
locations where git-annex can store annex object files, but that is not
too relevant to Windows anyway, because on windows the filesystem is always
treated as criplled and/or symlinks are not supported, so it will only
use one object location. It would need to be using a repo populated
in another OS to have the other object location in use probably.
Then a drop and get could possibly lead to a dangling lock file.
And, I was not able to actually reproduce that situation happening
before making that commit, even when I forced a race. So making these
changes on windows was just begging trouble..
I suspect that the change that caused the reversion is in
Annex/Content/Presence.hs. It checks if the content file exists,
and then called modifyContentDirWhenExists, which seems like it would
not fail, but if something deleted the content file at that point,
that call would fail. Which would result in an exception being thrown,
which should not normally happen from a call to inAnnexSafe. That was a
windows-specific change; the unix side did not have an equivilant
change.
Sponsored-by: Dartmouth College's Datalad project
Avoid treating refs/annex/last-index or other refs that are not commit
objects as evidence of repository corruption.
The repair code checks to find bad refs by trying to run `git log` on
them, and assumes that no output means something is broken. But git log
on a tree object is empty.
This was worth fixing generally, not as a special case, since it's certainly
possible that other things store tree or other objects in refs.
Sponsored-by: Max Thoursie on Patreon
This avoids displaying the unexpected exit codes message when
the list is eg [ExitSuccess, ExitFailure 1].
Sponsored-by: Dartmouth College's Datalad project
If the temp directory can somehow contain a hard link, it changes the
mode, which affects all other hard linked files. So, it's too unsafe
to use everywhere in git-annex, since hard links are possible in
multiple ways and it would be very hard to prove that every place that
uses a temp directory cannot possibly put a hard link in it.
Added a call to removeDirectoryForCleanup to test_crypto, which will
fix the problem that commit 17b20a2450
was intending to fix, with a much smaller hammer.
Sponsored-by: Dartmouth College's Datalad project
Using removePathForcibly avoids concurrent removal problems.
The i386ancient build still uses an old version of ghc and directory that
do not include removePathForcibly though.
Sponsored-by: Dartmouth College's Datalad project
prop_relPathDirToFileAbs_basics (TestableFilePath ":/") failed on
windows. The colon was filtered out after trying to make
the path relative, which only removed leading path separators.
So, ":/" changed to "/" which is not relative. Filtering out the colon
before hand avoids this problem.
Sponsored-by: Luke Shumaker on Patreon
add: Avoid unncessarily converting a newly unlocked file to be stored
in git when it is not modified, even when annex.largefiles does not
match it.
This fixes a reversion in version 10.20220222, where git-annex unlock
followed by git-annex add, followed by git commit file could result in
git thinking the file was modified after the commit.
I do have half a mind to remove the withUnmodifiedUnlockedPointers part
of git-annex add. It seems weird, despite that old bug report arguing
a case of consistency that it ought to behave that way. When git-annex
add surpises me, it seems likely it's wrong.. But for now, this is the
smallest possible fix.
Sponsored-by: Dartmouth College's Datalad project
Directory special remotes with importtree=yes have changed to once more
take inodes into account. This will cause extra work when importing from a
directory on a FAT filesystem that changes inodes on every mount.
To avoid that extra work, set ignoreinodes=yes when initializing a new
directory special remote, or change the configuration of your existing
remote: git-annex enableremote foo ignoreinodes=yes
This will mean a one-time re-import of all contents from every directory
special remote due to the changed setting.
73df633a62 thought
it was too unlikely that there would be modifications that the inode number
was needed to notice. That was probably right; it's very unlikely that a
file will get modified and end up with the same size and mtime as before.
But, what was not considered is that a program like NextCloud might write
two files with different content so closely together that they share the
mtime. The inode is necessary to detect that situation.
Sponsored-by: Max Thoursie on Patreon
Propagate nonzero exit status from git ls-files when a specified file does
not exist, or a specified directory does not contain any files checked into
git.
The recent completion of the annex.skipunknown transition exposed this
bug, that has unfortunately been lurking all along.
It is also possible that git ls-files errors out for some other reason
-- perhaps a permission problem -- and this will also fix error propagation
in such situations.
Sponsored-by: Dartmouth College's Datalad project
It did nothing, since at this point the link is dangling. But when there
is a thaw hook, it would probably not be happy to be asked to run on a
symlink, or might do something unexpected.
Sponsored-by: Dartmouth College's Datalad project
When annex.freezecontent-command is set, and the filesystem does not
support removing write bits, avoid treating it as a crippled filesystem.
The hook may be enough to prevent writing on its own, and some filesystems
ignore attempts to remove write bits.
Sponsored-by: Dartmouth College's Datalad project
It will then proceed to add the file the same as if it were any other
file containing possibly annexable content. Usually the file is one that
was annexed before, so the new, probably corrupt content will also be added
to the annex. If the file was not annexed before, the content will be added
to git.
It's not possible for the smudge filter to throw an error here, because
git then just adds the file to git anyway.
Sponsored-by: Dartmouth College's Datalad project
This format is designed to detect accidental appends, while having some
room for future expansion.
Detect when an unlocked file whose content is not present has gotten some
other content appended to it, and avoid treating it as a pointer file, so
that appended content will not be checked into git, but will be annexed
like any other file.
Dropped the max size of a pointer file down to 32kb, it was around 80 kb,
but without any good reason and certianly there are no valid pointer files
anywhere that are larger than 8kb, because it's just been specified what it
means for a pointer file with additional data even looks like.
I assume 32kb will be good enough for anyone. ;-) Really though, it needs
to be some smallish number, because that much of a file in git gets read
into memory when eg, catting pointer files. And since we have no use cases
for the extra lines of a pointer file yet, except possibly to add
some human-visible explanation that it is a git-annex pointer file, 32k
seems as reasonable an arbitrary number as anything. Increasing it would be
possible, eg to 64k, as long as users of such jumbo pointer files didn't
mind upgrading all their git-annex installations to one that supports the
new larger size.
Sponsored-by: Dartmouth College's Datalad project
takeByteString can only be used at the end of a parser, not before other
input. This was a dumb enough mistake that I audited the rest of the
code base for similar mistakes. Pity that attoparsec cannot avoid it at
the type level.
Fixes git-annex forget propagation between repositories. (reversion
introduced in version 7.20190122)
Sponsored-by: Brock Spratlen on Patreon
Seems that --no-ext-diff and -c diff.external= are not enough to disable
external diff command when gitattributes textconv specifies it.
I'm pretty sure that --no-ext-diff and -c diff.external= are not both
needed, but not 100%. Something about -G may need the latter to fully
disable diffs in some cases. So kept that part as it was.
Sponsored-by: Dartmouth College's Datalad project
The "+" argument only runs the command once, so is not safe to use. Using
";" instead would have been the simplest fix, but also the slowest.
Since my phone has an xargs that supports -0, I piped find to xargs
instead. Unsure how portable this will be, perhaps some android's don't
have xargs -0 or find -printf to send null terminated output.
The business with pipefail is necessary to make a failure of find cause the
import to fail. Probably this works on all androids, but if not, it will
probably just result in a failure of find being ignored. It would be
possible to make ignorefinderror just disable setting pipefail, but then
if some android has a shell that has pipefail enabled by default, ignorefinderror
would not work, so I kept the || true approach for that.
Sponsored-by: Max Thoursie on Patreon
Reject combinations of --batch (or --batch-keys) with options like --all or
--key or with filenames.
Most commands ignored the non-batch items when batch mode was enabled.
For some reason, addurl and dropkey both processed first the specified
non-batch items, followed by entering batch mode. Changed them to also
error out, for consistency.
Sponsored-by: Dartmouth College's Datalad project
autoUpgradeableVersions had latestVersion (10), but it did not make
sense for asking for old version 6 to get version 10, while asking for
version 8 got version 8. So use defaultVersion (8) instead.
Sponsored-by: Dartmouth College's Datalad project
The v10 upgrade should almost be safe now. What remains to be done is
notice when the v10 upgrade has occurred, while holding the shared lock,
and switch to using v10 lock files.
Sponsored-by: Dartmouth College's Datalad project
Do not populate the keys database with associated files,
because a bare repo has no working tree, and so it does not make sense to
populate it.
Queries of associated files in the keys database always return empty lists
in a bare repo, even if it's somehow populated. One way it could be
populated is if a user converts a non-bare repo to a bare repo.
Note that Git.Config.isBare does a string comparison, so this is not free!
But, that string comparison is very small compared to a sqlite query.
Sponsored-by: Erik Bjäreholt on Patreon
Recover from corrupted content being received from a git remote due eg to a
wire error, by deleting the temporary file when it fails to verify. This
prevents a retry from failing again.
Reversion introduced in version 8.20210903, when incremental verification
was added.
Only the git remote seems to be affected, although it is certianly
possible that other remotes could later have the same issue. This only
affects things passed to getViaTmp that return (False, UnVerified) due to
verification failing. As far as getViaTmp can tell, that could just as well
mean that the transfer failed in a way that would resume, so it cannot
delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally,
while other remotes do not, which is why only it seems affected.
A better fix perhaps would be to improve the types of the callback
passed to getViaTmp, so that some other value could be used to indicate
the state where the transfer succeeded but verification failed.
Sponsored-by: Boyd Stephen Smith Jr.
So that importing does not replace them with plain files.
This works similarly to how the previous handling of submodules and
matchers did, except that annexed symlinks still get exported as plain
files of course, it's only non-annexed symlinks that it does not make sense
to export.
When symlinks have previously been exported, updating the export will
unexport them after upgrading to this commit.
Sponsored-by: Kevin Mueller on Patreon
Capstone to this feature. Any transitions that have been performed on an
unmerged remote ref but not on the local git-annex branch, or vice-versa
have to be applied on the fly when reading files.
Sponsored-by: Dartmouth College's Datalad project
Continuing along the same lines as commit
2739adc258, it seems that
while Remote -> Retriever expands to the same data type this changes
it to, ghc 9.0.1 refuses to consider them equiviant. I guess it has
something to do with the forall?
The rest of the build all succeeds, although the stack build then crashes:
Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.4.0.0/build/git-annex/git-annex ...
Completed 233 action(s).
Prelude.chr: bad argument: 2214592520
This issue seems likely to be about it:
https://github.com/commercialhaskell/stack/pull/5508
I'm building with stack from debian, version 2.3.3, so a newer stack
probably avoids that. Anyway, despite that stack problem,
the git-annex binary is built, and works.
The stack.yaml I used for this build was patched as follows:
diff --git a/stack.yaml b/stack.yaml
index 8dac87c15..62c4b5b9d 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -1,6 +1,6 @@
flags:
git-annex:
- production: true
+ production: false
assistant: true
pairing: true
torrentparser: true
@@ -14,7 +14,7 @@ flags:
httpclientrestricted: true
packages:
- '.'
-resolver: lts-18.13
+resolver: nightly-2021-09-07
extra-deps:
- IfElse-0.85
- aws-0.22
Sponsored-by: Graham Spencer on Patreon
This reverts commit 66b2536ea0.
I misunderstood commit ac56a5c2a0
and caused a FD leak when pid locking is not used.
A LockHandle contains an action that will close the underlying lock
file, and that action is run when it is closed. In the case of a shared
lock, the lock file is opened once for each LockHandle, and only
the one for the LockHandle that is being closed will be closed.
Seem there are several races that happen when 2 threads run PidLock.tryLock
at the same time. One involves checkSaneLock of the side lock file, which may
be deleted by another process that is dropping the lock, causing checkSaneLock
to fail. And even with the deletion disabled, it can still fail, Probably due
to linkToLock failing when a second thread overwrites the lock file.
The same can happen when 2 processes do, but then one process just fails
to take the lock, which is fine. But with 2 threads, some actions where failing
even though the process as a whole had the pid lock held.
Utility.LockPool.PidLock already maintains a STM lock, and since it uses
LockShared, 2 threads can hold the pidlock at the same time, and when
the first thread drops the lock, it will remain held by the second
thread, and so the pid lock file should not get deleted until the last
thread to hold it drops the lock. Which is the right behavior, and why a
LockShared STM lock is used in the first place.
The problem is that each time it takes the STM lock, it then also calls
PidLock.tryLock. So that was getting called repeatedly and concurrently.
Fixed by noticing when the shared lock is already held, and stop calling
PidLock.tryLock again, just use the pid lock that already exists then.
Also, LockFile.PidLock.tryLock was deleting the pid lock when it failed
to take the lock, which was entirely wrong. It should only drop the side
lock.
Sponsored-by: Dartmouth College's Datalad project
It ought to exist, since linkToLock has just created it. However,
Lustre seems to have a rather probabilisitic view of the contents of a
directory, so catching the error if it somehow does not exist and
running the same code path that would be ran if linkToLock failed
might avoid this fun Lustre failure.
Sponsored-by: Dartmouth College's Datalad project
Commit b6e4ed9aa7 made non-annexed files
be re-uploaded every time, since they're not tracked in the location log,
and it made it check the location log. Don't do that for non-annexed files.
Sponsored-by: Brock Spratlen on Patreon
This version of git -- or its new default "ort" resolver -- handles such
a conflict by staging two files, one with the original name and the other
named file~ref. Use unmergedSiblingFile when the latter is detected.
(It doesn't do that when the conflict is between a directory and a file
or symlink though, so see previous commit for how that case is handled.)
The sibling file has to be deleted separately, because cleanConflictCruft
may not delete it -- that only handles files that are annex links,
but the sibling file may be the non-annexed file side of the conflict.
The graftin code had assumed that, when the other side of a conclict
is a symlink, the file in the work tree will contain the non-annexed
content that we want it to contain. But that is not the case with the new
git; the file may be the annex link and needs to be replaced with the
content, while the annex link will be written as a -variant file.
(The weird doesDirectoryExist check in graftin turns out to still be
needed, test suite failed when I tried to remove it.)
Test suite passes with new git with ort resolver default. Have not tried it
with old git or other defaults.
Sponsored-by: Noam Kremen on Patreon
The new "ort" resolver uses different filenames than what the test suite
accepted when resolving a conflict between a directory an an annexed
file. Make the test looser in what it accepts, so it will work with old
and new git.
Other tests still look for "conflictor.variant" as a prefix,
because when eg resolving a conflicted merge of 2 annexed files,
the filename is not changed by the ort resolver, and I didn't want to
unncessarily loosen the test.
Also I'm not entirely happy with the filenames used by the ort resolver,
see comment.
There's still another test failure caused by that resolver that is not
fixed yet.
See comment for analysis.
At first I thought I'd need to convert all T.unpack in git-annex, but
luckily not -- so long as the Text is read from a file, the filesystem
encoding is applied and T.unpack is fine. It's only when using Feed
that the filesystem encoding is not applied.
While this fixes the crash, it does result in some mojibake, eg:
itemid=http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-���-questions/
Have not tracked that down, but it must be unrelated, because
I've verified that it roundtrips when using encodeUf8:
joey@darkstar:~/src/git-annex>LANG=C ghci Utility/FileSystemEncoding.hs
ghci> useFileSystemEncoding
ghci> Just f <- Text.Feed.Import.parseFeedFromFile "/home/joey/tmp/career_tools_podcasts.xml"
ghci> Just (_, x) = Text.Feed.Query.getItemId (Text.Feed.Query.feedItems f !! 0)
ghci> decodeBS (Data.Text.Encoding.encodeUtf8 x)
"http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-\56546\56448\56467-questions/"
ghci> writeFile "foo" $ decodeBS (Data.Text.Encoding.encodeUtf8 x)
Writes a file containing the ENDASH character.
Sponsored-by: Jochen Bartl on Patreon
git-lfs: Fix interoperability with gitlab's implementation of the git-lfs
protocol, which requests Content-Encoding chunked.
Sponsored-by: Dartmouth College's Datalad project
* uninit: Avoid error message when no commits have been made to the
repository yet.
* uninit: Avoid error message when there is no git-annex branch.
Sponsored-by: Svenne Krap on Patreon
metadata --batch --json: Reject input whose "fields" does not consist of
arrays of strings. Such invalid input used to be silently ignored.
Used to be that parseJSON for a JSONActionItem ran parseJSON separately
for the itemAdded, and if that failed, did not propagate the error. That
allowed different items with differently named fields to be parsed.
But it was actually only used to parse "fields" for metadata, so that
flexability is not needed.
The fix is just to parse "fields" as-is. AddJSONActionItemFields is needed
only because of the wonky way Command.MetaData adds onto the started json
object.
Note that this line got a dummy type signature added,
just because the type checker needs it to be some type.
itemFields = Nothing :: Maybe Bool
Since it's Nothing, it doesn't really matter what type it is,
and the value gets turned into json and is then thrown away.
Sponsored-by: Kevin Mueller on Patreon
Turns out that CommandStart actions do not have their exceptions caught,
which is why the giveup was causing a crash. Mostly these actions
do not do very much work on their own, but it does seem possible there
are other commands whose CommandStart also throws an exception.
So, my first attempt at a fix was to catch those exceptions. But,
--json-error-messages then causes a difficulty, because in order to output
a json error message, an action needs to have been started; that sets up
the json object that the error message will be included in a field of.
While it would be possible to output an object with just an error field,
this would be json output of a format that the user has no reason to
expect, that happens only in an exceptional circumstance. That is something
I have always wanted to avoid with the json output; while git-annex man
pages don't document what the json looks like, the output has always
been made to be self-describing. Eg, it includes "error-messages":[]
even when there's no errors.
With that ruled out, it doesn't seem a good idea to catch CommandStart
exceptions and display the error to stderr when --json-error-messages
is set. And so I don't know if it makes sense to catch exceptions from that
at all. Maybe I'd have a different opinion if --json-error-messages did not
exist though.
So instead, output a blank line like other batch commands do.
This also leaves open the possibility of implementing support for matching
object with metadata --json, which would also want to output a blank line
when the input didn't match.
Sponsored-by: Dartmouth College's DANDI project
addurl: Support adding the same url to multiple files at the same time when
using -J with --batch --with-files.
Implementation was easier than expected, was able to reuse OnlyActionOn.
While it will download the url's content multiple times, that seems like
the best thing to do; see my comment for why.
Sponsored-by: Dartmouth College's DANDI project
This makes it be displayed in the error-messages field with
--json-error-messages. And with --quiet, it will let it be displayed,
which makes sense because it's telling the user why what they requested
to do has failed to happen.
Caused by dirContains ".." "foo" being incorrectly False.
Also added a test of dirContains, which includes all the previous bug fixes
I could find and some obvious cases.
Reversion in version 8.20211011
Sponsored-by: Brett Eisenberg on Patreon
Fix bug that caused stale git-annex branch information to read when
annex.private or remote.name.annex-private is set.
The private journal file should not prevent reading more current
information from the git-annex branch, but used to.
Note that, overBranchFileContents has to do additional work now, when
there's a private journal file, it reads from the branch redundantly
and more slowly.
Sponsored-by: Jack Hill on Patreon