Commit graph

2295 commits

Author SHA1 Message Date
Joey Hess
85efc13e3a
avoid build warning with recent ghc
foldl' is in Prelude now. Explicitly import Data.List still
for older systems and add explict Prelude import to avoid warning.
2025-01-21 12:00:16 -04:00
Joey Hess
c7cca43ab0
RawFilePath conversion for Utility.Directory.Stream 2025-01-20 19:25:52 -04:00
Joey Hess
1ceece3108
RawFilePath conversion of System.Directory
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.

Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.

Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
2025-01-20 19:17:33 -04:00
Joey Hess
e5be81f8d4
stop exporting Utility.SystemDirectory from Utility.Directory 2025-01-20 19:10:25 -04:00
Joey Hess
42d55bc57c
pre-init config and hook
Added annex.pre-init-command git config and pre-init-annex hook that is run
before git-annex repository initialization.

This can block initialization. Or it can preform pre-initialization
configuration or tweaking.

I left stdio connected while it's running, so it could also be used for
interactive prompting conceivably, although that would want to use /dev/tty
anyway probably in order to not pollute the stdout of a command when
automatic initialization is done.

Sponsored-by: Dartmouth College's OpenNeuro project
2025-01-13 14:22:49 -04:00
Joey Hess
a73fa77417
added hooks corresponding to annex.*-command
* Added freezecontent-annex and thawcontent-annex hooks that
  correspond to the git configs annex.freezecontent and
  annex.thawcontent.
* Added secure-erase-annex hook that corresponds to the git config
  annex.secure-erase-command.
* Added commitmessage-annex hook that corresponds to the git config
  annex.commitmessage-command.
* Added http-headers-annex hook that corresponds to the git config
  annex.http-headers-command.
  that correspond to the post-update-annex and pre-commit-annex hooks.

The use case for these is eg, setting up a git repository that is run in a
container, where the easiest way to provide a script is by putting it in
.git/hooks/, rather than copying it into the container in a way that puts
it in PATH.

This is all the ones that make sense to add for annex.*-config git configs.
annex.youtube-dl-command is not a hook, it's telling git-annex what command
to run. So is annex.shared-sop-command. So omitted those.

May later also want to add hooks corresponding to
`remote.<name>.annex-cost-command` etc.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-01-10 14:54:42 -04:00
Joey Hess
5df1b2b36e
configs annex.post-update-command and annex.pre-commit-command
Added git configs annex.post-update-command and annex.pre-commit-command
that correspond to the git-annex hook scripts post-update-annex and
pre-commit-annex.

Note that the hook files take precience over the git config, since the git
config can includ global config which should be overridden by local config.

These new git configs are probably not super useful. Especially the
pre-commit-annex hook is there to install scripts to instead of the
pre-commit hook, since git-annex installs that hook itself. So why would
someone want to use a git config for that? Only reason I can think of would
be in a global git config. Or possibly because it's easier to set a git
config than write a hook script, on an OS like Windows.

The real reason I'm adding these is as groundwork for making other
annex.*-command git configs also be available as hook scripts. I want
to avoid having some things available as only git hooks and others as
both gitconfigs and git hooks. (It seems that some annex.*-command configs
don't translate to git hooks though.)

In the man page, moved documentation of the hooks to be next to the
documentation of the git configs. This is to avoid repitition.
2025-01-10 13:27:51 -04:00
Joey Hess
0ad3ea3026
fix windows build 2025-01-10 10:37:39 -04:00
Joey Hess
43b35f9493
windows permissions fix
Windows: Fix permission denied error when dropping files that have the
readonly attribute set.

Files coming from a special remote may have had write permission removed
from them. The directory special remote does that. And there are
probably others. So rather than fixing it on the special remote side,
made moveAnnex, on Windows, add back the write bit. This apparently
removes the readonly attribute. See Remote.Directory.removeDirGeneric
which already did the same on windows to allow removing files from the
directory special remote.

The reason that cleanObjectLoc also calls allowWrite is to handle
situations where files have already gotten into git-annex repositories on
Windows with the write bit set. Eg, an older git-annex put them there.
Or perhaps the git-annex repository was populated on some other OS.
2025-01-07 16:37:39 -04:00
Joey Hess
29b3c7c660
annex.addunlocked support for tree imports
Honor annex.addunlocked configuration when importing a tree from a special
remote.

Note, in a --no-content import, the object file will not be populated
(usually) and so expressions that match on mime type will not match. Tested
this and it works ok, the file just ends up locked. Updated docs for the
mime expressions to mention that they can't match when the file is present

Note that in Command.Sync.pullThirdPartyPopulated, recordImportTree is
called without a AddUnlockedMatcher. Since the tree generated here is not
exposed to the user and does not contain usual filenames, there is no need
of the overhead of checking it.
2024-12-19 11:43:51 -04:00
Joey Hess
f4b2606ff1
comment typo 2024-12-02 14:05:15 -04:00
Joey Hess
8663c72f1e
git-remote-annex: Fix buggy behavior when annex.stalldetection is configured
Make programPath never return "git-remote-annex" or other known multi-call
program names, which are not git-annex and won't behave like it.
If the git-annex binary gets installed under some entirely other name,
it will still return it.

This change exposed that readProgramFile actually could crash,
which happened before only if getExecutablePath was not absolute
and there was no ~/.config/git-annex/program. So fixed that to catch
exception.
2024-11-25 12:14:52 -04:00
Joey Hess
126daf949d
DATA-PRESENT working for exporttree=yes remotes
Since the annex-tracking-branch is pushed first, git-annex has already
updated the export database when the DATA-PRESENT arrives. Which means
that just using checkPresent is enough to verify that there is some file
on the special remote in the export location for the key.

So, the simplest possible implementation of this happened to work!

(I also tested it with chunked specialremotes, which also works, as long
as the chunk size used is the same as the configured chunk size. In that
case, the lack of a chunk log is not a problem. Doubtful this will ever
make sense to use with a chunked special remote though, that gets pretty
deep into re-implementing git-annex.)

Updated the client side upload tip with a missing step, and reorged for clarity.
2024-10-30 13:55:47 -04:00
Joey Hess
ccbc5189b5
Fix hang when receiving a large file into a proxied special remote
Only indicate that we're done with the bytestring once it all gets written.
Otherwise, the end of it may get garbage collected before we can process
it, leading to a hang.

This seems to have been introduced in commit
cdc4bd7443. Which oddly was trying to fix a
very similar problem, but specific to a cluster node. In that commit,
things got out of order, with it signaling it was done with the bytestring
before it has written all of it to the file.

My test case for this bug is a directory special remote
with a file being sent to it via a proxy accessed via ssh or http.
The file was 10 mb, and it hung on the last few kb of it not being
received.

I've also tested this fix in the case of proxying to a cluster node
directory special remote over http, which was the case
cdc4bd7443 was dealing with.
2024-10-30 12:29:37 -04:00
Joey Hess
f19ebabe89
support DATA-PRESENT when proxtying for special remotes
There is a TODO left in the code for exporttree special remotes. If
possible, it should check if one of the export locations contains the
content of the key.
2024-10-29 14:55:31 -04:00
Joey Hess
8baccda98f
Merge branch 'master' into streamproxy 2024-10-22 09:49:28 -04:00
Joey Hess
2c14181bcb
better name for LinkPresentAdjustment 2024-10-21 15:42:01 -04:00
Joey Hess
d9b4bf4224
added retrieveKeyFileInOrder and ORDERED to external special remote protocol
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
2024-10-15 15:40:14 -04:00
Joey Hess
f920d90781
smaller delay in proxy streamer
A one second delay made it seem really choppy and slow when the special
remote was sending content fairly steadily but was bottlenecked on
running gpg on 10 mb chunks.

This does not appreciably increase CPU, although of course if the
special remote is very slow it will add up over time.

It would perhaps be better to use inotify, like tailVerify does.
2024-10-15 14:45:19 -04:00
Joey Hess
54fcc2ec51
fix logic error 2024-10-15 14:28:47 -04:00
Joey Hess
edaed18e4c
Sped up proxied downloads from special remotes, by streaming
Currently works for special remotes that don't use fileRetriever. Ones that
do will download to another filename and rename it into place, defeating
the streaming.

This actually benchmarks slightly slower when getting a large file from
a fast proxied special remote. However, when the proxied special remote
is slow, it will be a big win.
2024-10-15 12:25:15 -04:00
Joey Hess
10216b44d2
use NonEmpty for dirHashes
This avoids 4 uses of head.
2024-09-26 18:15:00 -04:00
Joey Hess
783e910d0c
sim: Add metadata command
Only really needed for completeness, preferred content expressions can
match against metadata.
2024-09-26 12:20:37 -04:00
Joey Hess
6a95e4edad
sim: support "--" as comment
Using this in my sim files that are also mdwn files to avoid comments
being displayed as headers.
2024-09-25 14:47:32 -04:00
Joey Hess
8e94b75a61
support simulating clusters
Without actually simulating cluster implementation at all. Instead, only
the essential fact that cluster gateways know what changes they have
made to each node of a cluster. That is enough for sims like
sizebalanced_cluster.
2024-09-25 14:06:41 -04:00
Joey Hess
b9214d4162
Revert "sim: add commands for cluster management"
This reverts commit 344141da63.

Rethinking this
2024-09-25 12:11:03 -04:00
Joey Hess
344141da63
sim: add commands for cluster management
Clusters are not actually simulated yet.
2024-09-25 11:48:22 -04:00
Joey Hess
8047128591
sim: quiesce before freezing or ending
Probably a good idea for freezing, but especially I hope this fixes a
problem with git-annex sim run that caused it to sometimes crash in
removeDirectoryRecursive with directory not empty, presumably because a
thread was writing there at the same time.
2024-09-24 16:46:09 -04:00
Joey Hess
540bd5e1ab
sim: added run subcommand
And a nice sim of random preferred content expressions.
2024-09-24 12:06:34 -04:00
Joey Hess
9571162057
sim: add stepstable 2024-09-24 11:50:24 -04:00
Joey Hess
4ed58d7894
sim: random preferred content expression generation 2024-09-24 11:23:23 -04:00
Joey Hess
ee3d6502bb
prevent action or step from simulating running on a special remote
Without any connections, the step command will not try to do any actions
on a special remote.

But even without any connections, it's still possible for a drop action
explicitly run "on" the special remote to do, when numcopies = 0 or
there is a trusted repo. So guard all actions against running on a
special remote too.
2024-09-24 10:15:56 -04:00
Joey Hess
7cc4312695
fix state overwrite bug
I have needed to excercise a lot of care in threading st through, and I
got it wrong here. Probably using a state monad would be a good idea.
2024-09-24 10:00:38 -04:00
Joey Hess
d3a3c722c9
oops 2024-09-23 16:02:39 -04:00
Joey Hess
eec07aec68
sim: avoid step looking for new actions every time
Once it has a list of actions, it can perform them all.

A disappointing optimisation at least in my test case, which it sped up
by less than 1 second out of 12. But still it did make it faster.
2024-09-23 15:50:47 -04:00
Joey Hess
969e6c2747
sped up sim step by about 200%
Noticed that it was quite slow compared with things like action
sendwanted. Guessed that the slowdown is largely due to every step
doing a simulated git pull/push.

So, rather than always doing a pull/push, only do those when no actions
are found without doing a pull/push.

This does mean that step will sometimes experience a split brain
situation, but that seems like a good thing? Because step ought to
explore as many possible scenarios as it reasonably can.
2024-09-23 15:45:47 -04:00
Joey Hess
6b040ed32d
allow complex shell commands 2024-09-23 15:04:33 -04:00
Joey Hess
7bc8c2bfeb
sim visit as first-class command
Allows using it in a sim file.
2024-09-23 13:09:35 -04:00
Joey Hess
6cf9a101b8
sim: Fix size tracking for balanced preferred content 2024-09-23 12:42:32 -04:00
Joey Hess
e9d4cef10f
sim: fix state loss bug 2024-09-20 18:11:37 -04:00
Joey Hess
f5f7b4a936
avoid adding redundant present/notpresent to sim history 2024-09-20 15:45:05 -04:00
Joey Hess
e9c59eceb8
bugfixes
sim stabilization works now
2024-09-20 15:39:52 -04:00
Joey Hess
19b966f0fd
sim: better step
On each step, find all the actions that could be done, and pick one of them
to do.

Should detect stability, but that is broken.
2024-09-20 15:23:34 -04:00
Joey Hess
31679e3e9f
set simRootDirectory on restore
It's a relative directory and the cwd may be different. Or the repo
could have been moved.
2024-09-20 15:11:55 -04:00
Joey Hess
bab330de33
remove sim log file 2024-09-20 15:03:54 -04:00
Joey Hess
f061ae92fb
sim: implement addtree 2024-09-20 10:34:52 -04:00
Joey Hess
6751f23978
sim: fix get bug
When getting from a remote, have to check that the repo doing the
getting thinks the remote contains the key, but also that the remote
actually does. Before this bug fix, it would get from a repo that used
to have the key, but that had dropped it since the last git pull.
2024-09-17 14:29:49 -04:00
Joey Hess
e568ac96b7
record initial seed in sim log
Unless the log starts with a command that records a seed.
2024-09-17 13:49:50 -04:00
Joey Hess
02f0996e25
git-annex sim log 2024-09-17 13:43:11 -04:00
Joey Hess
b85965cb3c
sim: implement dropunwantedfrom 2024-09-17 13:35:35 -04:00