The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.
Working pretty well. Mostly. But:
* Does not yet support inputs that are non-annexed files checked into git
* --fast is currently broken (will need something like VURL keys)
* --unreproducible still uses a checksumming backend, so drop and get
again will likely fail (needs probably to use an URL key or something
like one)
The compute special remote seems to work pretty well too. Eg,
getting from it works, and dropping content that is present in it works.
However, filepath-bytestring is still in Setup-Depends.
That's because Utility.OsPath uses it when not built with OsPath.
It would be maybe possible to make Utility.OsPath fall back to using
filepath, and eliminate that dependency too, but it would mean either
wrapping all of System.FilePath's functions, or using `type OsPath = FilePath`
Annex.Import uses ifdefs to avoid converting back to FilePath when not
on windows. On windows it's a bit slower due to that conversion.
Utility.Path.Windows.convertToWindowsNativeNamespace got a bit
slower too, but not really worth optimising I think.
Note that importing Utility.FileSystemEncoding at the same time as
System.Posix.ByteString will result in conflicting definitions for
RawFilePath. filepath-bytestring avoids that by importing RawFilePath
from System.Posix.ByteString, but that's not possible in
Utility.FileSystemEncoding, since Setup-Depends does not include unix.
This turned out not to affect any code in git-annex though.
Sponsored-by: Leon Schuermann
Test suite failure looks like this:
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
git-annex: fd:4: Data.ByteString.hGetLine: end of file
git-annex: user error (git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--no-filters","--stdin-paths"] exited 128)
This is apparently in Git.HashObject, and probably in hashBlob, which uses a
temp file with a name starting with "hash", but I have not been able to tell
what's wrong.
I don't understand where the "\\?" path prefix (windows UNC-style path)
is coming from in the path that gets fed into git hash-file, or why git
hash-file can't open the file.
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.
Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.
Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
* Removed the i386ancient standalone tarball build for linux, which
was increasingly unable to support new git-annex features.
* Removed support for building with ghc older than 9.0.2,
and with older versions of haskell libraries than are in current Debian
stable.
* stack.yaml: Update to lts-23.2.
Note that i386ancient was targeting linux 2.6.32, which has been EOL for
over 9 years now. Any old system still using such a kernel is certainly highly
insecure. And I suspect i386ancient had its own insecurities due to haskell
libraries and C libraries not having been updated.
Make programPath never return "git-remote-annex" or other known multi-call
program names, which are not git-annex and won't behave like it.
If the git-annex binary gets installed under some entirely other name,
it will still return it.
This change exposed that readProgramFile actually could crash,
which happened before only if getExecutablePath was not absolute
and there was no ~/.config/git-annex/program. So fixed that to catch
exception.
This bypasses the usual haskell file locking used to prevent opening a
file for read that is being written to.
This is unfortunately a bit of a hack. But it seems fairly unlikely to
get broken by changes to ghc. I hope. Using fdToHandle' will also work.
This does not work on windows because it uses openFd from posix. It
would probably be possible to implement it for windows too, just opening
the FD using the Win32 library instead. However, whether windows will
allow reading from a file that is also being written to I don't know,
and since in the git-annex case the writer could be another process (eg
external special remote), that might be doing its own locking in
windows, that seems a can of worms I'd prefer not to open.
Have most of the sim command handler, but to keep it pure while implementing
the rest will need some refactoring.
It seems likely that running the simulation itself will not be able to be
entirely pure. Preferred content evaluation runs in Annex after all.
Note that the somewhat awkward randomWords is because the i386ancient
build depends on a version of random too old to support generating a
random ByteString on its own.
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.
So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.
There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.
Not currently in a compilable state.
This is very innefficient, it will need to be optimised not to
calculate the sizes of repos every time.
Also, fixed a bug in balancedPicker that caused it to pick a too high
index when some repos were excluded due to being full.
The idea is that upon a merge of the git-annex branch, or a commit to
the git-annex branch, the reposize database will be updated. So it
should always accurately reflect the location log sizes, but it will
often be behind the actual current sizes.
Annex.reposizes will start with the value from the database, and get
updated with each transfer, so it will reflect a process's best
understanding of the current sizes.
When there are multiple processes all transferring to the same repo,
Annex.reposize will not reflect transfers made by the other processes
since the current process started. So when using balanced preferred
content, it may make suboptimal choices, including trying to transfer
content to the repo when another process has already filled it up.
But this is the same as if there are multiple processes running on
ifferent machines, so is acceptable. The reposize will eventually
get an accurate value reflecting changes made by other processes or in
other repos.
This deals with the possible security problem that someone could make an
unusually low UUID and generate keys that are all constructed to hash to
a number that, mod the number of repositories in the group, == 0.
So balanced preferred content would always put those keys in the
repository with the low UUID as long as the group contains the
number of repositories that the attacker anticipated.
Presumably the attacker than holds the data for ransom? Dunno.
Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs
combined together as the "secret", and the key as the "message". Now any
change in the set of UUIDs in a group will invalidate the attacker's
constructed keys from hashing to anything in particular.
Given that there are plenty of other things someone can do if they can
write to the repository -- including modifying preferred content so only
their repository wants files, and numcopies so other repositories drom
them -- this seems like safeguard enough.
Note that, in balancedPicker, combineduuids is memoized.
Primarily because Windows needs a dependency bump to get stm-2.5.1
for Servant build flag.
This includes Win32-2.13.4.0 and aws-0.24 which adds some features
that windows had been missing out on as well.
Lots of warnings about head and tail will need to eventually be
addressed. Of course AFAIK the uses of it in git-annex are all safe.
For writeTMVar. Would be possible to rewrite to use something else, but
I don't want to. Might be possible to write a writeTMVar that works with
the old version of stm.
Websockets would work, but the problem with using them for this is that
each lockcontent call is a separate websocket connection. And that's an
actual TCP connection. One TCP connection per file dropped would be too
expensive. With http long polling, regular http pipelining can be used,
so it will reuse a TCP connection.
Unfortunately, at least with servant, bi-directional streams with long
polling don't result in true bidirectional full duplex communication.
Servant processes the whole client body stream before generating the server
body stream. I think it's entirely possible to do full bi-directional
communication over http, but it would need changes to servant.
And, there's no way for the client to tell if the server successfully
locked the content, since the server will keep processing the client
stream no matter what.:
So, added a new api endpoint, keeplocked. lockcontent will lock the key
for 10 minutes with retention lock, and then a call to keeplocked will
keep it locked for as long as needed. This does mean that there will
need to be a Map of locks by key, and I will probably want to add
some kind of lock identifier that lockcontent returns.
Enough to let lockcontent routes be included and servant-client be used.
But not enough to use servant-client with those routes. May need to
implement a separate runner for that part of the protocol?
Also some misc other stuff needed to use servant-client.
And fix exposing of UUID in the JSON types. UUID does actually have
aeson instances, but they're used elsewhere (metadata --batch, although
only included to get it to compile, not actually used in there) and not
suitable for use here since this must work with every possible UUID.