Commit graph

834 commits

Author SHA1 Message Date
Joey Hess
73060eea51
annex.fastcopy
Added annex.fastcopy and remote.name.annex-fastcopy config setting. When
set, this allows the copy_file_range syscall to be used, which can eg allow
for server-side copies on NFS. (For fastest copying, also disable
annex.verify or remote.name.annex-verify.)

This is a simple implementation, that does not handle resuming as well as
it possibly could.

It can be used with both local git remotes (including on NFS), and
directory special remotes. Other types of remotes could in theory also
support it, so I've left the config documented as a general thing.
2025-06-03 15:01:38 -04:00
Joey Hess
d416107c7d
improve comment 2025-04-11 11:17:09 -04:00
Joey Hess
1313cc4d60
mask remotes, partial implementation
Everything implemented except for passing through to the masked remote.
Which should be trivial.
2025-04-10 13:10:07 -04:00
Joey Hess
e81fd72018
Added remote.name.annex-web-options config
Which is a per-remote version of the annex.web-options config.

Had to plumb RemoteGitConfig through to getUrlOptions. In cases where a
special remote does not use curl, there was no need to do that and I used
Nothing instead.

In the case of the addurl and importfeed commands, it seemed best to say
that running these commands is not using the web special remote per se,
so the config is not used for those commands.
2025-04-01 10:17:38 -04:00
Joey Hess
52f51d065a
rename config to annex.security.allowed-compute-programs
And require for enable as well as autoenable.

It seemed asking for trouble for `git-annex enable foo` to use whatever
compute program is stored in the git config, without verifying that the
user wants that program to be used.

Note that it would be good to allow `git-annex enable foo program=...`
to be used without the program being in the git config. Not implemented yet
though.
2025-03-03 16:12:03 -04:00
Joey Hess
f32d2aecce
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
2025-03-03 15:52:56 -04:00
Joey Hess
2ff716be30
OsPath build flag no longer depends on filepath-bytestring
However, filepath-bytestring is still in Setup-Depends.
That's because Utility.OsPath uses it when not built with OsPath.
It would be maybe possible to make Utility.OsPath fall back to using
filepath, and eliminate that dependency too, but it would mean either
wrapping all of System.FilePath's functions, or using `type OsPath = FilePath`

Annex.Import uses ifdefs to avoid converting back to FilePath when not
on windows. On windows it's a bit slower due to that conversion.
Utility.Path.Windows.convertToWindowsNativeNamespace got a bit
slower too, but not really worth optimising I think.

Note that importing Utility.FileSystemEncoding at the same time as
System.Posix.ByteString will result in conflicting definitions for
RawFilePath. filepath-bytestring avoids that by importing RawFilePath
from System.Posix.ByteString, but that's not possible in
Utility.FileSystemEncoding, since Setup-Depends does not include unix.
This turned out not to affect any code in git-annex though.

Sponsored-by: Leon Schuermann
2025-02-10 16:39:55 -04:00
Joey Hess
5eef09a3cc
more OsPath conversion (650/749)
Sponsored-by: Nicholas Golder-Manning
2025-02-07 17:03:31 -04:00
Joey Hess
a5d48edd94
more OsPath conversion (602/749)
Sponsored-by: Brock Spratlen
2025-02-07 14:46:11 -04:00
Joey Hess
0d2b805806
more OsPath conversion (520/749)
Sponsored-by: mycroft
2025-02-05 15:07:59 -04:00
Joey Hess
4dc904bbad
more OsPath conversion
Sponsored-by: Leon Schuermann
2025-02-04 16:09:47 -04:00
Joey Hess
54f0710fd2
more OsPath conversion (464/749)
Sponsored-by: unqueued
2025-02-04 13:35:17 -04:00
Joey Hess
71195cce13
more OsPath conversion
Sponsored-by: k0ld
2025-02-01 14:06:38 -04:00
Joey Hess
474cf3bc8b
more OsPath conversion
Sponsored-by: Brock Spratlen
2025-02-01 11:54:19 -04:00
Joey Hess
c69e57aede
more OsPath conversion
Sponsored-by: Jack Hill
2025-01-30 15:46:32 -04:00
Joey Hess
27305042f3
more OsPath conversion
Sponsored-by: Nicholas Golder-Manning
2025-01-29 11:53:20 -04:00
Joey Hess
0376bc5ee0
more OsPath conversion
Sponsored-by: Luke T. Shumaker
2025-01-28 16:31:19 -04:00
Joey Hess
22c2451e26
more OsPath conversion
Sponsored-by: mycroft
2025-01-28 15:46:00 -04:00
Joey Hess
917c43f31f
Merge /home/joey/tmp/git-annex into ospath 2025-01-28 15:29:58 -04:00
Joey Hess
87cda29dd7 remove Read instance for AssociatedFile
This instance is not used.
2025-01-28 15:29:25 -04:00
Joey Hess
7ebef6cd1b
more OsPath conversion
keyFile has a nice improvement; since a Key is a ShortByteString, it can
be converted to an OsPath without needing the copy that was done before.

Unfortunately, fileKey has to convert from a ShortByteString to a
ByteString in order to use attoparsec, and then the results get
converted back to an OsPath, so there are now 2 copies.
Maybe attoparsec will eventually get a ShortByteString API,
see https://github.com/haskell/attoparsec/issues/225

Sponsored-by: Joshua Antonishen
2025-01-27 16:55:07 -04:00
Joey Hess
8bafe05500
more OsPath conversion 2025-01-27 10:13:43 -04:00
Joey Hess
5bca78b813
OsPath conversion
Decent win in exportDirectories, since it operates on ShortByteString
end to end now without needing conversion. That made it worth
implementing an OsPath specific code path there.

And ExportLocation already being a ShortByteString is an good example of why
it's a good thing that OsPath uses that!

Sponsored-by: k0ld on Patreon
2025-01-25 11:53:47 -04:00
Joey Hess
6e27b0d4d1
convert from readFileStrict
This removes that function, using file-io readFile' instead.

Had to deal with newline conversion, which readFileStrict does on
Windows. In a few cases, that was pretty ugly to deal with.

Sponsored-by: Kevin Mueller
2025-01-22 16:20:36 -04:00
Joey Hess
1ceece3108
RawFilePath conversion of System.Directory
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.

Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.

Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
2025-01-20 19:17:33 -04:00
Joey Hess
42d55bc57c
pre-init config and hook
Added annex.pre-init-command git config and pre-init-annex hook that is run
before git-annex repository initialization.

This can block initialization. Or it can preform pre-initialization
configuration or tweaking.

I left stdio connected while it's running, so it could also be used for
interactive prompting conceivably, although that would want to use /dev/tty
anyway probably in order to not pollute the stdout of a command when
automatic initialization is done.

Sponsored-by: Dartmouth College's OpenNeuro project
2025-01-13 14:22:49 -04:00
Joey Hess
5df1b2b36e
configs annex.post-update-command and annex.pre-commit-command
Added git configs annex.post-update-command and annex.pre-commit-command
that correspond to the git-annex hook scripts post-update-annex and
pre-commit-annex.

Note that the hook files take precience over the git config, since the git
config can includ global config which should be overridden by local config.

These new git configs are probably not super useful. Especially the
pre-commit-annex hook is there to install scripts to instead of the
pre-commit hook, since git-annex installs that hook itself. So why would
someone want to use a git config for that? Only reason I can think of would
be in a global git config. Or possibly because it's easier to set a git
config than write a hook script, on an OS like Windows.

The real reason I'm adding these is as groundwork for making other
annex.*-command git configs also be available as hook scripts. I want
to avoid having some things available as only git hooks and others as
both gitconfigs and git hooks. (It seems that some annex.*-command configs
don't translate to git hooks though.)

In the man page, moved documentation of the hooks to be next to the
documentation of the git configs. This is to avoid repitition.
2025-01-10 13:27:51 -04:00
Joey Hess
dd052dcba1
annexInsteadOf config
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.

While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.

That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.

Anyway, there are other use cases for this that don't involve annex+http.
2024-12-03 14:39:07 -04:00
Joey Hess
8baccda98f
Merge branch 'master' into streamproxy 2024-10-22 09:49:28 -04:00
Joey Hess
2c14181bcb
better name for LinkPresentAdjustment 2024-10-21 15:42:01 -04:00
Joey Hess
82e91b380a
add GITMANIFEST to parseKeyVariety
git-remote-annex: Fix bug that prevented using it with external special
remotes, leading to protocol error messages involving "GITMANIFEST".
2024-10-19 17:12:23 -04:00
Joey Hess
d9b4bf4224
added retrieveKeyFileInOrder and ORDERED to external special remote protocol
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
2024-10-15 15:40:14 -04:00
Joey Hess
835283b862
stream through proxy when using fileRetriever
The problem was that when the proxy requests a key be retrieved to its
own temp file, fileRetriever was retriving it to the key's temp
location, and then moving it at the end, which broke streaming.

So, plumb through the path where the key is being retrieved to.
2024-10-15 14:29:06 -04:00
Joey Hess
783e910d0c
sim: Add metadata command
Only really needed for completeness, preferred content expressions can
match against metadata.
2024-09-26 12:20:37 -04:00
Joey Hess
6cf9a101b8
sim: Fix size tracking for balanced preferred content 2024-09-23 12:42:32 -04:00
Joey Hess
52891711d2
git-annex sim command is working
Had to add Read instances to Key and NumCopies and some other similar
types. I only expect to use those in serializing a sim. Of course, this
risks that implementation changes break reading old data. For a sim,
that would not be a big problem.
2024-09-12 16:10:52 -04:00
Joey Hess
7e8274c6b7
implemented ActionDropUnwanted
Not tested yet. This emulates the same checking that is done when
dropping. Note that when dropping from a special remote it is not able
to make a locked copy.
2024-09-12 10:44:31 -04:00
Joey Hess
4e11cb19ef
implemented cloneSimRepo
Started on updateSimRepoState
2024-09-06 14:23:29 -04:00
Joey Hess
5807e1480c
correct comment
This is not related to v5 versus newer versions.
2024-09-03 14:23:32 -04:00
Joey Hess
340bdd0dac
treat "not present" in preferred content as invalid
Detect when a preferred content expression contains "not present", which
would lead to repeatedly getting and then dropping files, and make it never
match. This also applies to "not balanced" and "not sizebalanced".

--explain will tell the user when this happens

Note that getMatcher calls matchMrun' and does not check for unstable
negated limits. While there is no --present anyway, if there was,
it would not make sense for --not --present to complain about
instability and fail to match.
2024-09-03 13:50:06 -04:00
Joey Hess
35ff8c8c00
use Utility.PID
fixes build on i386ancient
2024-08-30 14:56:38 -04:00
Joey Hess
f89a1b8216
remove stale live changes from reposize database
Reorganized the reposize database directory, and split up a column.

checkStaleSizeChanges needs to run before needLiveUpdate,
otherwise the process won't be holding a lock on its pid file, and
another process could go in and expire the live update it records. It
just so happens that they do get called in the correct order, since
checking balanced preferred content calls getLiveRepoSizes before
needLiveUpdate.

The 1 minute delay between checks is arbitrary, but will avoid excess
work. The downside of it is that, if a process is dropping a file and
gets interrupted, for 1 minute another process can expect a repository
will soon be smaller than it is. And so a process might send data to a
repository when a file is not really going to be dropped from it. But
note that can already happen if a drop takes some time in eg locking and
then fails. So it seems possible that live updates should only be
allowed to increase, rather than decrease the size of a repository.
2024-08-28 13:57:25 -04:00
Joey Hess
e006acef22
avoid reposize database locking overhead when not needed
Only when the preferred content expression being matched uses balanced
preferred content is this overhead needed.

It might be possible to eliminate the locking entirely. Eg, check the
live changes before and after the action and re-run if they are not
stable. For now, this is good enough, it avoids existing preferred
content getting slow. If balanced preferred content turns out to be too
slow to check, that could be tried later.
2024-08-28 10:52:34 -04:00
Joey Hess
4d2f95853d
closing in on finishing live reposizes
Fixed successfullyFinishedLiveSizeChange to not update the rolling total
when a redundant change is in RecentChanges.

Made setRepoSizes clear RecentChanges that are no longer needed.
It might be possible to clear those earlier, this is only a convenient
point to do it.

The reason it's safe to clear RecentChanges here is that, in order for a
live update to call successfullyFinishedLiveSizeChange, a change must be
made to a location log. If a RecentChange gets cleared, and just after
that a new live update is started, making the same change, the location
log has already been changed (since the RecentChange exists), and
so when the live update succeeds, it won't call
successfullyFinishedLiveSizeChange. The reason it doesn't
clear RecentChanges when there is a reduntant live update is because
I didn't want to think through whether or not all races are avoided in
that case.

The rolling total in SizeChanges is never cleared. Instead,
calcJournalledRepoSizes gets the initial value of it, and then
getLiveRepoSizes subtracts that initial value from the current value.
Since the rolling total can only be updated by updateRepoSize,
which is called with the journal locked, locking the journal in
calcJournalledRepoSizes ensures that the database does not change while
reading the journal.
2024-08-27 12:54:46 -04:00
Joey Hess
d7813876a0
fixed the build
Manually tested getLiveRepoSizes and it is working correctly.
2024-08-27 09:41:35 -04:00
Joey Hess
521e0a7062
fix a deadlock
When finishedLiveUpdate was run on a different key than expected, it
blocked forever waiting for an indication the database had been updated.

Since the journal is locked when finishedLiveUpdate runs, this could
also have caused other git-annex commands to hang.
2024-08-27 00:13:54 -04:00
Joey Hess
18f8d61f55
rolling total of size changes in RepoSize database
When a live size change completes successfully, the same transaction
that removes it from the database updates the rolling total for its
repository.

The idea is that when RepoSizes is read, SizeChanges will be as
well, and cached locally. Any time a change is made, the local cache
will be updated. So by comparing the local cache with the current
SizeChanges, it can learn about size changes that were made by other
processes. Then read the LiveSizeChanges, and add that in to get a live
picture of the current sizes.

Also added a SizeChangeId. This allows 2 different threads, or
processes, to both record a live size change for the same repo and key,
and update their own information without stepping on one-another's toes.
2024-08-25 10:34:47 -04:00
Joey Hess
d60a33fd13
improve live update starting
In an expression like "balanced=foo and exclude=bar", avoid it starting
a live update when the overall expression doesn't match.
2024-08-24 13:07:05 -04:00
Joey Hess
2f20b939b7
LiveUpdate db updates working
I've tested the behavior of the thread that waits for the LiveUpdate to
be finished, and it does get signaled and exit cleanly when the
LiveUpdate is GCed instead.

Made finishedLiveUpdate wait for the thread to finish updating the
database.

There is a case where GC doesn't happen in time and the database is left
with a live update recorded in it. This should not be a problem as such
stale data can also happen when interrupted and will need to be detected
when loading the database.

Balanced preferred content expressions now call startLiveUpdate.
2024-08-24 11:49:58 -04:00
Joey Hess
c3d40b9ec3
plumb in LiveUpdate (WIP)
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.

So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.

There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.

Not currently in a compilable state.
2024-08-23 16:35:12 -04:00