Commit graph

1118 commits

Author SHA1 Message Date
Joey Hess
6ebab7fb00
factor out Annex.GitShaKey 2025-03-03 11:09:28 -04:00
Joey Hess
3bec89a3c3
started git-annex recompute
The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.
2025-02-26 11:54:09 -04:00
Joey Hess
a154e91513
add git-annex addcomputed
Working pretty well. Mostly. But:

* Does not yet support inputs that are non-annexed files checked into git
* --fast is currently broken (will need something like VURL keys)
* --unreproducible still uses a checksumming backend, so drop and get
  again will likely fail (needs probably to use an URL key or something
  like one)

The compute special remote seems to work pretty well too. Eg,
getting from it works, and dropping content that is present in it works.
2025-02-25 15:50:08 -04:00
Joey Hess
c1b53dbbd0
wip 2025-02-20 13:27:47 -04:00
Joey Hess
17c1a1dde4
fix description of ParallelBuild 2025-02-12 12:32:22 -04:00
Joey Hess
2ff716be30
OsPath build flag no longer depends on filepath-bytestring
However, filepath-bytestring is still in Setup-Depends.
That's because Utility.OsPath uses it when not built with OsPath.
It would be maybe possible to make Utility.OsPath fall back to using
filepath, and eliminate that dependency too, but it would mean either
wrapping all of System.FilePath's functions, or using `type OsPath = FilePath`

Annex.Import uses ifdefs to avoid converting back to FilePath when not
on windows. On windows it's a bit slower due to that conversion.
Utility.Path.Windows.convertToWindowsNativeNamespace got a bit
slower too, but not really worth optimising I think.

Note that importing Utility.FileSystemEncoding at the same time as
System.Posix.ByteString will result in conflicting definitions for
RawFilePath. filepath-bytestring avoids that by importing RawFilePath
from System.Posix.ByteString, but that's not possible in
Utility.FileSystemEncoding, since Setup-Depends does not include unix.
This turned out not to affect any code in git-annex though.

Sponsored-by: Leon Schuermann
2025-02-10 16:39:55 -04:00
Joey Hess
c21526e3e2
Merge branch 'ospath-mk1' into ospath 2025-01-30 14:33:08 -04:00
Joey Hess
fb7a0ccb4c
Revert "disable OsPath build flag on windows for now"
This reverts commit 55cf9ce28f.

Problem was fixed by commit c1e90767da
2025-01-30 14:32:44 -04:00
Joey Hess
97c83152d6
Merge branch 'master' into ospath 2025-01-29 18:48:02 -04:00
Joey Hess
55cf9ce28f
disable OsPath build flag on windows for now
Test suite failure looks like this:

        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        fatal: could not open '\\?\C:\Users\appveyor\AppData\Local\Temp\1\hash-cc81b41d-dfda-4ae8-904b-b531742443cc' for reading: No such file or directory
        git-annex: fd:4: Data.ByteString.hGetLine: end of file
        git-annex: user error (git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--no-filters","--stdin-paths"] exited 128)

This is apparently in Git.HashObject, and probably in hashBlob, which uses a
temp file with a name starting with "hash", but I have not been able to tell
what's wrong.

I don't understand where the "\\?" path prefix (windows UNC-style path)
is coming from in the path that gets fed into git hash-file, or why git
hash-file can't open the file.
2025-01-29 17:22:21 -04:00
Joey Hess
12660314f1
continue conversion
Add Utility.OsString, with a special case for length.
2025-01-23 11:46:35 -04:00
Joey Hess
1faa3af9cd
add file-io to build-depends when building with OsPath flag
Partly converted code to use functions from it, though more remain
unconverted. Most of withFile and openFile now use it.
2025-01-21 14:26:04 -04:00
Joey Hess
71f83bfff7
rename OsString to OsPath 2025-01-21 11:57:44 -04:00
Joey Hess
1ceece3108
RawFilePath conversion of System.Directory
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.

Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.

Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
2025-01-20 19:17:33 -04:00
Joey Hess
a55e1da1aa
preparing release 2025-01-15 11:39:40 -04:00
Joey Hess
b4e406952c
prep for release tomorrow and copyright year update 2025-01-01 14:24:57 -04:00
Joey Hess
da5e195597
remove i386ancient and need at least debian stable to build
* Removed the i386ancient standalone tarball build for linux, which
  was increasingly unable to support new git-annex features.
* Removed support for building with ghc older than 9.0.2,
  and with older versions of haskell libraries than are in current Debian
  stable.
* stack.yaml: Update to lts-23.2.

Note that i386ancient was targeting linux 2.6.32, which has been EOL for
over 9 years now. Any old system still using such a kernel is certainly highly
insecure. And I suspect i386ancient had its own insecurities due to haskell
libraries and C libraries not having been updated.
2025-01-01 14:15:55 -04:00
Joey Hess
430f6bc9c7
releasing package git-annex version 10.20241202 2024-12-02 12:36:24 -04:00
Joey Hess
8663c72f1e
git-remote-annex: Fix buggy behavior when annex.stalldetection is configured
Make programPath never return "git-remote-annex" or other known multi-call
program names, which are not git-annex and won't behave like it.
If the git-annex binary gets installed under some entirely other name,
it will still return it.

This change exposed that readProgramFile actually could crash,
which happened before only if getExecutablePath was not absolute
and there was no ~/.config/git-annex/program. So fixed that to catch
exception.
2024-11-25 12:14:52 -04:00
Joey Hess
9b7378fb79
prepare for release tomorrow 2024-10-30 14:08:47 -04:00
Joey Hess
b83fdf66df
Allow enabling the servant build flag with older versions of stm
Allowing building with ghc 9.0.2 (debian stable).

Updated patch covering all uses of writeTMVar.
2024-10-17 20:55:31 -04:00
Joey Hess
3a53c60121
Allow enabling the servant build flag with older versions of stm
Allowing building with ghc 9.0.2 (debian stable).
2024-10-17 14:04:31 -04:00
Joey Hess
76a1989a0e
implement openFileBeingWritten
This bypasses the usual haskell file locking used to prevent opening a
file for read that is being written to.

This is unfortunately a bit of a hack. But it seems fairly unlikely to
get broken by changes to ghc. I hope. Using fdToHandle' will also work.

This does not work on windows because it uses openFd from posix. It
would probably be possible to implement it for windows too, just opening
the FD using the Win32 library instead. However, whether windows will
allow reading from a file that is also being written to I don't know,
and since in the git-annex case the writer could be another process (eg
external special remote), that might be doing its own locking in
windows, that seems a can of worms I'd prefer not to open.
2024-10-15 11:56:42 -04:00
Joey Hess
e8e4347fcc
update version for release 2024-09-27 10:01:44 -04:00
Joey Hess
84bbbeae9d
started on sim file parser 2024-09-11 11:53:25 -04:00
Joey Hess
b932acf4ad
started Annex.Sim
Have most of the sim command handler, but to keep it pure while implementing
the rest will need some refactoring.

It seems likely that running the simulation itself will not be able to be
entirely pure. Preferred content evaluation runs in Annex after all.

Note that the somewhat awkward randomWords is because the i386ancient
build depends on a version of random too old to support generating a
random ByteString on its own.
2024-09-04 15:15:36 -04:00
Joey Hess
b3dc656153
releasing package git-annex version 10.20240831 2024-08-31 19:50:26 -04:00
Joey Hess
c3d40b9ec3
plumb in LiveUpdate (WIP)
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.

So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.

There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.

Not currently in a compilable state.
2024-08-23 16:35:12 -04:00
Joey Hess
06064f897c
update Annex.reposizes when changing location logs
The live update is only needed when Annex.reposizes has already been
populated.
2024-08-15 13:27:14 -04:00
Joey Hess
745bc5c547
take maxsize into account for balanced preferred content
This is very innefficient, it will need to be optimised not to
calculate the sizes of repos every time.

Also, fixed a bug in balancedPicker that caused it to pick a too high
index when some repos were excluded due to being full.
2024-08-13 11:00:20 -04:00
Joey Hess
99a126bebb
added reposize database
The idea is that upon a merge of the git-annex branch, or a commit to
the git-annex branch, the reposize database will be updated. So it
should always accurately reflect the location log sizes, but it will
often be behind the actual current sizes.

Annex.reposizes will start with the value from the database, and get
updated with each transfer, so it will reflect a process's best
understanding of the current sizes.

When there are multiple processes all transferring to the same repo,
Annex.reposize will not reflect transfers made by the other processes
since the current process started. So when using balanced preferred
content, it may make suboptimal choices, including trying to transfer
content to the repo when another process has already filled it up.
But this is the same as if there are multiple processes running on
ifferent machines, so is acceptable. The reposize will eventually
get an accurate value reflecting changes made by other processes or in
other repos.
2024-08-12 11:19:58 -04:00
Joey Hess
1265d7e5df
implement maxsize log and command
* maxsize: New command to tell git-annex how large the expected maximum
  size of a repository is.
* vicfg: Include maxsize configuration.
2024-08-11 15:41:26 -04:00
Joey Hess
bd5affa362
use hmac in balanced preferred content
This deals with the possible security problem that someone could make an
unusually low UUID and generate keys that are all constructed to hash to
a number that, mod the number of repositories in the group, == 0.
So balanced preferred content would always put those keys in the
repository with the low UUID as long as the group contains the
number of repositories that the attacker anticipated.
Presumably the attacker than holds the data for ransom? Dunno.

Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs
combined together as the "secret", and the key as the "message". Now any
change in the set of UUIDs in a group will invalidate the attacker's
constructed keys from hashing to anything in particular.

Given that there are plenty of other things someone can do if they can
write to the repository -- including modifying preferred content so only
their repository wants files, and numcopies so other repositories drom
them -- this seems like safeguard enough.

Note that, in balancedPicker, combineduuids is memoized.
2024-08-10 16:32:54 -04:00
Joey Hess
c15c32b5f8
releasing package git-annex version 10.20240808 2024-08-08 15:27:04 -04:00
Joey Hess
c1bc0bffc8
releasing package git-annex version 10.20240731 2024-07-31 14:05:01 -04:00
Joey Hess
d1b641cb1e
update stack.yaml to nightly-2024-07-29 and remove stack-lts-18.13.yaml
Primarily because Windows needs a dependency bump to get stm-2.5.1
for Servant build flag.

This includes Win32-2.13.4.0 and aws-0.24 which adds some features
that windows had been missing out on as well.

Lots of warnings about head and tail will need to eventually be
addressed. Of course AFAIK the uses of it in git-annex are all safe.
2024-07-29 20:09:37 -04:00
Joey Hess
54f2ea2b85
fix syntax 2024-07-29 19:13:31 -04:00
Joey Hess
ebe81ccdfa
servant build flag needs stm-2.5.1
For writeTMVar. Would be possible to rewrite to use something else, but
I don't want to. Might be possible to write a writeTMVar that works with
the old version of stm.
2024-07-29 19:10:00 -04:00
Joey Hess
ab22938c0b
fix build without servant 2024-07-24 15:13:02 -04:00
Joey Hess
b7454f1eeb
protocol version fallback on 404
and prettified errors
2024-07-23 14:58:49 -04:00
Joey Hess
b0eed55d4f
factor out http server and client into own modules
To avoid a cycle when Remote.Git uses the client.
2024-07-23 14:12:38 -04:00
Joey Hess
6bbc4565e6
started wiring p2phttp into Remote.Git
but we have a cycle, ugh
2024-07-23 13:53:10 -04:00
Joey Hess
5c39652235
starting support for remote.name.annexUrl set to annex+http
In this case, Remote.Git should not use that url for all access to
the repository. It will only be used for annex operations, which isn't
done yet.
2024-07-23 09:12:21 -04:00
Joey Hess
fdb888a56a
update servant build flag
make it work when building w/o assistant
2024-07-23 08:53:56 -04:00
Joey Hess
b290a72025
update deps 2024-07-11 14:51:45 -04:00
Joey Hess
9a592f946f
split module 2024-07-08 21:12:23 -04:00
Joey Hess
82d66ede5e
convert lockcontent api to http long polling
Websockets would work, but the problem with using them for this is that
each lockcontent call is a separate websocket connection. And that's an
actual TCP connection. One TCP connection per file dropped would be too
expensive. With http long polling, regular http pipelining can be used,
so it will reuse a TCP connection.

Unfortunately, at least with servant, bi-directional streams with long
polling don't result in true bidirectional full duplex communication.
Servant processes the whole client body stream before generating the server
body stream. I think it's entirely possible to do full bi-directional
communication over http, but it would need changes to servant.

And, there's no way for the client to tell if the server successfully
locked the content, since the server will keep processing the client
stream no matter what.:

So, added a new api endpoint, keeplocked. lockcontent will lock the key
for 10 minutes with retention lock, and then a call to keeplocked will
keep it locked for as long as needed. This does mean that there will
need to be a Map of locks by key, and I will probably want to add
some kind of lock identifier that lockcontent returns.
2024-07-08 12:57:46 -04:00
Joey Hess
9ee005e49a
dummy HasClient ClientM WebSocket
Enough to let lockcontent routes be included and servant-client be used.
But not enough to use servant-client with those routes. May need to
implement a separate runner for that part of the protocol?

Also some misc other stuff needed to use servant-client.

And fix exposing of UUID in the JSON types. UUID does actually have
aeson instances, but they're used elsewhere (metadata --batch, although
only included to get it to compile, not actually used in there) and not
suitable for use here since this must work with every possible UUID.
2024-07-07 21:21:45 -04:00
Joey Hess
9a726cedf6
servant server now compiling
Just need to fill in some undefined
2024-07-07 14:48:20 -04:00
Joey Hess
1dbb5ec70d
servant API type is complete 2024-07-07 12:59:12 -04:00