Merge branch 'ospath'

This commit is contained in:
Joey Hess 2025-02-17 11:58:20 -04:00
commit 5324f34092
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
384 changed files with 4796 additions and 4542 deletions

View file

@ -22,3 +22,5 @@ Tried with (on a Manjaro box):
git-annex rules and is a marvelous tool.
I wanted to try the union merging to resolve merge conflicts on non-annexed files. It's not ideal, but might be better than repeated `git annex assist|sync` eventually adding the merge conflict markers `<<<<<<<<<` and the like to the files, breaking things like `.gitattributes` syntax which in turn has more devastating lockup consequences...
> removed it, [[done]] --[[Joey]]

View file

@ -1,38 +0,0 @@
# NAME
git-union-merge - Join branches together using a union merge
# SYNOPSIS
git union-merge ref ref newref
# DESCRIPTION
Does a union merge between two refs, storing the result in the
specified newref.
The union merge will always succeed, but assumes that files can be merged
simply by concatenating together lines from all the oldrefs, in any order.
So, this is useful only for branches containing log-type data.
Note that this does not touch the checked out working copy. It operates
entirely on git refs and branches.
# EXAMPLE
git union-merge git-annex origin/git-annex refs/heads/git-annex
Merges the current git-annex branch, and a version from origin,
storing the result in the git-annex branch.
# BUGS
File modes are not currently merged.
# AUTHOR
Joey Hess <id@joeyh.name>
<http://git-annex.branchable.com/>
Warning: Automatically converted into a man page by mdwn2man. Edit with care

View file

@ -68,7 +68,7 @@ internal tracking of information about git-annex repositories and annexed
objects.
The files stored in this branch are all designed to be auto-merged
using git's [[union merge driver|git-union-merge]]. So each line
by simply concacenating them together. So each line
has a timestamp, to allow the most recent information to be identified.
### `uuid.log`

View file

@ -1,75 +1,50 @@
For a long time (since 2019) git-annex has been progressively converting from
FilePath to RawFilePath (aka ByteString).
FilePath to RawFilePath. And more recently, to OsPath.
[[!meta title="OsPath conversion"]]
The reason is mostly performance, also a simpler representation of
filepaths that doesn't need encoding hacks to support non-UTF8 values.
Some commands like `git-annex find` use RawFilePath end-to-end.
But this conversion is not yet complete. This is a todo to keep track of the
status.
Some commands like `git-annex find` have been converted end-to-end
with good performance results. And OsPath is used very widly now.
But this conversion is not yet complete. This is a todo to keep track
of the status.
* The OsPath build flag makes git-annex build with OsPath. Otherwise,
it builds with RawFilePath. The plan is to make that build flag the
default where it is not already as time goes on. And then eventually
remove the build flag and simplify code in git-annex to not need to
support two different build methods.
* unix has modules that operate on RawFilePath but no OSPath versions yet.
See https://github.com/haskell/unix/issues/240
* filepath-1.4.100 implements support for OSPath. It is bundled with
ghc-9.6.1 and above. Will need to switch from filepath-bytestring to
this, and to avoid a lot of ifdefs, probably only after git-annex no
longers supports building with older ghc versions. This will entail
replacing all the RawFilePath with OsPath, which should be pretty
mechanical, with only some wrapper functions in Utility.FileIO and
Utility.RawFilePath needing to be changed.
However, this is not really a performance problem, because converting
from an OsPath to a RawFilePath in order to use such a function
is the same amount of work as calling a native OsPath version of the
function would be, because passing a ShortByteString into the FFI entails
making a copy of it.
Work on this is underway, in the `ospath` branch.
* filepath-bytestring is used when not building with OsPath. It's also
in Setup-Depends. In order to stop needing to maintain that library,
the goal is to eliminate it from dependencies. This may need to wait
until the OsPath build flag is removed and OsPath is always used.
* Git.LsFiles has several `pipeNullSplit'` calls that have toOsPath
mapped over the results. That adds an additional copy, so the lazy
ByteString is converted to strict,
and then to ShortByteString, with a copy each time. This is in the
critical path for large git repos, and might be a noticable slowdown.
There is currently no easy way to go direct from a lazy ByteString to a
ShortByteString, although it would certianly be possible to write low
level code to do it efficiently. Alternatively, it would be possible to
read a strict ByteString direct from a handle, like hGetLine does
(although in this case it would need to stop at the terminating 0 byte)
and unsafePerformIO to stream to a list would avoid needing to rewrite
this code to not use a list.
* OsPath has by now been pushed about as far as it will go, but here and
there use of FilePath remains in odd corners. These are unlikely to cause
any noticiable performance impact.
[[!tag confirmed]]
----
The following patch can be useful to find points where conversions are
done. Especially useful to identify cases where a value is converted
`FilePath -> RawFilePath -> FilePath`.
diff --git a/Utility/FileSystemEncoding.hs b/Utility/FileSystemEncoding.hs
index 2a1dc81bc1..03e6986f6e 100644
--- a/Utility/FileSystemEncoding.hs
+++ b/Utility/FileSystemEncoding.hs
@@ -84,6 +84,9 @@ encodeBL = L.fromStrict . encodeBS
encodeBL = L8.fromString
#endif
+debugConversions :: String -> IO ()
+debugConversions s = hPutStrLn stderr ("conversion: " ++ s)
+
decodeBS :: S.ByteString -> FilePath
#ifndef mingw32_HOST_OS
-- This does the same thing as System.FilePath.ByteString.decodeFilePath,
@@ -92,6 +95,7 @@ decodeBS :: S.ByteString -> FilePath
-- something other than a unix filepath.
{-# NOINLINE decodeBS #-}
decodeBS b = unsafePerformIO $ do
+ debugConversions (show b)
enc <- Encoding.getFileSystemEncoding
S.useAsCStringLen b (GHC.peekCStringLen enc)
#else
@@ -106,17 +110,19 @@ encodeBS :: FilePath -> S.ByteString
-- something other than a unix filepath.
{-# NOINLINE encodeBS #-}
encodeBS f = unsafePerformIO $ do
+ debugConversions f
enc <- Encoding.getFileSystemEncoding
- GHC.newCStringLen enc f >>= unsafePackMallocCStringLen
+ b <- GHC.newCStringLen enc f >>= unsafePackMallocCStringLen
+ return b
#else
encodeBS = S8.fromString
#endif
fromRawFilePath :: RawFilePath -> FilePath
-fromRawFilePath = decodeFilePath
+fromRawFilePath = decodeBS -- decodeFilePath
toRawFilePath :: FilePath -> RawFilePath
-toRawFilePath = encodeFilePath
+toRawFilePath = encodeBS -- encodeFilePath
{- Truncates a FilePath to the given number of bytes (or less),
- as represented on disk.