Continuing along the same lines as commit
2739adc258, it seems that
while Remote -> Retriever expands to the same data type this changes
it to, ghc 9.0.1 refuses to consider them equiviant. I guess it has
something to do with the forall?
The rest of the build all succeeds, although the stack build then crashes:
Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.4.0.0/build/git-annex/git-annex ...
Completed 233 action(s).
Prelude.chr: bad argument: 2214592520
This issue seems likely to be about it:
https://github.com/commercialhaskell/stack/pull/5508
I'm building with stack from debian, version 2.3.3, so a newer stack
probably avoids that. Anyway, despite that stack problem,
the git-annex binary is built, and works.
The stack.yaml I used for this build was patched as follows:
diff --git a/stack.yaml b/stack.yaml
index 8dac87c15..62c4b5b9d 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -1,6 +1,6 @@
flags:
git-annex:
- production: true
+ production: false
assistant: true
pairing: true
torrentparser: true
@@ -14,7 +14,7 @@ flags:
httpclientrestricted: true
packages:
- '.'
-resolver: lts-18.13
+resolver: nightly-2021-09-07
extra-deps:
- IfElse-0.85
- aws-0.22
Sponsored-by: Graham Spencer on Patreon
This reverts commit 66b2536ea0.
I misunderstood commit ac56a5c2a0
and caused a FD leak when pid locking is not used.
A LockHandle contains an action that will close the underlying lock
file, and that action is run when it is closed. In the case of a shared
lock, the lock file is opened once for each LockHandle, and only
the one for the LockHandle that is being closed will be closed.
Seem there are several races that happen when 2 threads run PidLock.tryLock
at the same time. One involves checkSaneLock of the side lock file, which may
be deleted by another process that is dropping the lock, causing checkSaneLock
to fail. And even with the deletion disabled, it can still fail, Probably due
to linkToLock failing when a second thread overwrites the lock file.
The same can happen when 2 processes do, but then one process just fails
to take the lock, which is fine. But with 2 threads, some actions where failing
even though the process as a whole had the pid lock held.
Utility.LockPool.PidLock already maintains a STM lock, and since it uses
LockShared, 2 threads can hold the pidlock at the same time, and when
the first thread drops the lock, it will remain held by the second
thread, and so the pid lock file should not get deleted until the last
thread to hold it drops the lock. Which is the right behavior, and why a
LockShared STM lock is used in the first place.
The problem is that each time it takes the STM lock, it then also calls
PidLock.tryLock. So that was getting called repeatedly and concurrently.
Fixed by noticing when the shared lock is already held, and stop calling
PidLock.tryLock again, just use the pid lock that already exists then.
Also, LockFile.PidLock.tryLock was deleting the pid lock when it failed
to take the lock, which was entirely wrong. It should only drop the side
lock.
Sponsored-by: Dartmouth College's Datalad project
It ought to exist, since linkToLock has just created it. However,
Lustre seems to have a rather probabilisitic view of the contents of a
directory, so catching the error if it somehow does not exist and
running the same code path that would be ran if linkToLock failed
might avoid this fun Lustre failure.
Sponsored-by: Dartmouth College's Datalad project
Commit b6e4ed9aa7 made non-annexed files
be re-uploaded every time, since they're not tracked in the location log,
and it made it check the location log. Don't do that for non-annexed files.
Sponsored-by: Brock Spratlen on Patreon
This version of git -- or its new default "ort" resolver -- handles such
a conflict by staging two files, one with the original name and the other
named file~ref. Use unmergedSiblingFile when the latter is detected.
(It doesn't do that when the conflict is between a directory and a file
or symlink though, so see previous commit for how that case is handled.)
The sibling file has to be deleted separately, because cleanConflictCruft
may not delete it -- that only handles files that are annex links,
but the sibling file may be the non-annexed file side of the conflict.
The graftin code had assumed that, when the other side of a conclict
is a symlink, the file in the work tree will contain the non-annexed
content that we want it to contain. But that is not the case with the new
git; the file may be the annex link and needs to be replaced with the
content, while the annex link will be written as a -variant file.
(The weird doesDirectoryExist check in graftin turns out to still be
needed, test suite failed when I tried to remove it.)
Test suite passes with new git with ort resolver default. Have not tried it
with old git or other defaults.
Sponsored-by: Noam Kremen on Patreon
The new "ort" resolver uses different filenames than what the test suite
accepted when resolving a conflict between a directory an an annexed
file. Make the test looser in what it accepts, so it will work with old
and new git.
Other tests still look for "conflictor.variant" as a prefix,
because when eg resolving a conflicted merge of 2 annexed files,
the filename is not changed by the ort resolver, and I didn't want to
unncessarily loosen the test.
Also I'm not entirely happy with the filenames used by the ort resolver,
see comment.
There's still another test failure caused by that resolver that is not
fixed yet.
Bugfix: When -J was enabled, getting files leaked a ever-growing number of
git cat-file processes.
(Since commit dd39e9e255)
The leak happened when mergeState called stopNonConcurrentSafeCoProcesses.
While stopNonConcurrentSafeCoProcesses usually manages to stop everything,
there was a race condition where cat-file processes were leaked. Because
catFileStop modifies Annex.catfilehandles in a non-concurrency safe way,
and could clobber modifications made in between. Which should have been ok,
since originally catFileStop was only used at shutdown.
Note the comment on catFileStop saying it should only be used when nothing
else is using the handles. It would be possible to make catFileStop
race-safe, but it should just not be used in a situation where a race is
possible. So I didn't bother.
Instead, the fix is just not to stop any processes in mergeState. Because
in order for mergeState to be called, dupState must have been run, and it
enables concurrency mode, stops any non-concurrent processes, and so all
processes that are running are concurrency safea. So there is no need to
stop them when merging state. Indeed, stopping them would be extra work,
even if there was not this bug.
Sponsored-by: Dartmouth College's Datalad project
When non-concurrent git coprocesses have been started, setConcurrency
used to not stop them, and so could leak processes when enabling
concurrency, eg when forkState is called.
I do not think that ever actually happened, given where setConcurrency
is called. And it probably would only leak one of each process, since it
never downgrades from concurrent to non-concurrent.
See comment for analysis.
At first I thought I'd need to convert all T.unpack in git-annex, but
luckily not -- so long as the Text is read from a file, the filesystem
encoding is applied and T.unpack is fine. It's only when using Feed
that the filesystem encoding is not applied.
While this fixes the crash, it does result in some mojibake, eg:
itemid=http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-���-questions/
Have not tracked that down, but it must be unrelated, because
I've verified that it roundtrips when using encodeUf8:
joey@darkstar:~/src/git-annex>LANG=C ghci Utility/FileSystemEncoding.hs
ghci> useFileSystemEncoding
ghci> Just f <- Text.Feed.Import.parseFeedFromFile "/home/joey/tmp/career_tools_podcasts.xml"
ghci> Just (_, x) = Text.Feed.Query.getItemId (Text.Feed.Query.feedItems f !! 0)
ghci> decodeBS (Data.Text.Encoding.encodeUtf8 x)
"http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-\56546\56448\56467-questions/"
ghci> writeFile "foo" $ decodeBS (Data.Text.Encoding.encodeUtf8 x)
Writes a file containing the ENDASH character.
Sponsored-by: Jochen Bartl on Patreon
I saw a user open a forum post that looked like they had read this man
page, but were unaware of git-annex unlock, and were looking for its
functionality. They later deleted the post, probably when they found
git-annex unlock. Looking at this man page, it was a bit unclear about
the motivation for the command.
Yes, I'm warching... ;-)
While intended for converting URL keys added by addurl --fast to be
as if added by addurl --relaxed, it can also be used to remove size
from other types of keys. Although that is not likely to be useful
for checksummed keys, I suppose it could be used for WORM or other
non-checksum keys.
Specifying the --remove-size option does not prevent other migrations
from taking effect if there's a key upgrade to perform, or if the
backend has changed. So --backend=URL needs to be used to prevent
migrating an URL key to the default backend.
Note that it's not possible to use git-annex migrate to convert from a
non-URL key to an URL key, as URL keys cannot be generated, except by
addurl. So while this can get the same effect as --relaxed would have
when addurl --fast was used, when --fast was not used, it won't work, or
if --backend=URL is not used will remove the size but not prevent
checksum verification, which is not useful. Due to this complexity, I
decided not to mention it in the git-annex addurl man page.
Sponsored-by: Jochen Bartl on Patreon
git-lfs: Fix interoperability with gitlab's implementation of the git-lfs
protocol, which requests Content-Encoding chunked.
Sponsored-by: Dartmouth College's Datalad project
* uninit: Avoid error message when no commits have been made to the
repository yet.
* uninit: Avoid error message when there is no git-annex branch.
Sponsored-by: Svenne Krap on Patreon
Based on my earlier benchmark, I have a rough cost model for how
expensive it is for git-annex smudge to be run on a file, vs
how expensive it is for a gigabyte of a file's content to be read and
piped through to filter-process.
So, using that cost model, it can decide if using filter-process will
be more or less expensive than running the smudge filter on the files to
be restaged.
It turned out to be *really* annoying to temporarily disable
filter-process. I did find a way, but urk, this is horrible. Notice
that, if it's interrupted with it disabled, it will remain disabled
until the next time restagePointerFile runs. Which could be some time
later. If the user runs `git add` or `git checkout` on a lot of small
files before that, they will see slower than expected performance.
(This commit also deletes where I wrote down the benchmark results
earlier.)
Sponsored-by: Noam Kremen on Patreon
filter-process: New command that can make git add/checkout faster when
there are a lot of unlocked annexed files or non-annexed files, but that
also makes git add of large annexed files slower.
Use it by running: git
config filter.annex.process 'git-annex filter-process'
Fully tested and working, but I have not benchmarked it at all.
And, incremental hashing is not done when git add uses it, so extra work is
done in that case.
Sponsored-by: Mark Reidenbach on Patreon
metadata --batch --json: Reject input whose "fields" does not consist of
arrays of strings. Such invalid input used to be silently ignored.
Used to be that parseJSON for a JSONActionItem ran parseJSON separately
for the itemAdded, and if that failed, did not propagate the error. That
allowed different items with differently named fields to be parsed.
But it was actually only used to parse "fields" for metadata, so that
flexability is not needed.
The fix is just to parse "fields" as-is. AddJSONActionItemFields is needed
only because of the wonky way Command.MetaData adds onto the started json
object.
Note that this line got a dummy type signature added,
just because the type checker needs it to be some type.
itemFields = Nothing :: Maybe Bool
Since it's Nothing, it doesn't really matter what type it is,
and the value gets turned into json and is then thrown away.
Sponsored-by: Kevin Mueller on Patreon
Turns out that CommandStart actions do not have their exceptions caught,
which is why the giveup was causing a crash. Mostly these actions
do not do very much work on their own, but it does seem possible there
are other commands whose CommandStart also throws an exception.
So, my first attempt at a fix was to catch those exceptions. But,
--json-error-messages then causes a difficulty, because in order to output
a json error message, an action needs to have been started; that sets up
the json object that the error message will be included in a field of.
While it would be possible to output an object with just an error field,
this would be json output of a format that the user has no reason to
expect, that happens only in an exceptional circumstance. That is something
I have always wanted to avoid with the json output; while git-annex man
pages don't document what the json looks like, the output has always
been made to be self-describing. Eg, it includes "error-messages":[]
even when there's no errors.
With that ruled out, it doesn't seem a good idea to catch CommandStart
exceptions and display the error to stderr when --json-error-messages
is set. And so I don't know if it makes sense to catch exceptions from that
at all. Maybe I'd have a different opinion if --json-error-messages did not
exist though.
So instead, output a blank line like other batch commands do.
This also leaves open the possibility of implementing support for matching
object with metadata --json, which would also want to output a blank line
when the input didn't match.
Sponsored-by: Dartmouth College's DANDI project