Commit graph

3026 commits

Author SHA1 Message Date
Joey Hess
51538fa0a8
improve error message when unable to get an input file
In this case, the compute program is run the same as if addcomputed --fast
were used, so it should succeed, without outputting a computed file.

computeInputsUnavailable is in ComputeState for simplicity, but it is
not serialized with the rest of the ComputeState.
2025-03-04 13:13:18 -04:00
Joey Hess
b395bd4f56
move showOutput into compute remote 2025-03-04 10:02:33 -04:00
Joey Hess
89bfeada87
recompute: display one of the changed files 2025-03-03 15:12:19 -04:00
Joey Hess
b01a0d2323
avoid recomputing every time on git inputs 2025-03-03 14:56:49 -04:00
Joey Hess
a0d6a6ea2a
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.

Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.

Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
2025-03-03 12:09:25 -04:00
Joey Hess
6ebab7fb00
factor out Annex.GitShaKey 2025-03-03 11:09:28 -04:00
Joey Hess
63d73d8d1b
record VURL key hashes in addcomputed and recompute 2025-03-03 10:57:56 -04:00
Joey Hess
b813549b2d
fix build 2025-02-27 16:18:04 -04:00
Joey Hess
e6ae5e8d56
many recompute improvements
I've lost track of them all, but it includes:

* Using the same key backend as was used in the original computation.
* Fixing bug that prevented updating the source file key in the compute
  state
* Handling --reproducible and --unreproducible.
* recompute --original of a file using VURL, when the result is
  different, but the key remains the same, makes the object file
  be updated with the new content
* Detecting some other ways the program behavior can change, just for
  completeness.
* Also adds --backend to addcomputed.
2025-02-27 15:18:27 -04:00
Joey Hess
9c2c3002a6
fix recompute of renamed files
When a computed file has been renamed, a recompute needs to write to the
new filename.

I decided to remove --others because it's not clear what it should do in
the face of renames. Should it update only other files that have not
been renamed? Or update files that use the old key to the new key
anywhere in the tree? Or write the other files to the cwd, ignoring
renames? Since --others is just a way to save on compute time, adding
this complexity at this point seems like a bad idea. May revisit later.

Added temporary TODO-compute file
2025-02-27 11:27:26 -04:00
Joey Hess
5d2a608a56
todo 2025-02-26 15:59:47 -04:00
Joey Hess
d6a010a615
recompute closer to working properly
Proper behavior without --others implemented.

And eliminated most of the code duplication through refactoring.

Also, changed it to not stage recomputed files. This way, git diff will
show files that have differences.
2025-02-26 15:52:52 -04:00
Joey Hess
53d107ca47
refactor 2025-02-26 14:05:37 -04:00
Joey Hess
3bec89a3c3
started git-annex recompute
The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.
2025-02-26 11:54:09 -04:00
Joey Hess
d49f371acc
showOutput
when the compute program eg displays usage, it needs to start on its own
line
2025-02-26 09:47:56 -04:00
Joey Hess
eed522a0f8
addcomputed inherits extra initremote parameters
This is limited because the remote config is a field/value map. So order
is not preserved, and when 2 parameters have the same field name, only
the last one will be passed.
2025-02-26 09:45:35 -04:00
Joey Hess
a5b53fa98a
todo 2025-02-25 18:45:55 -04:00
Joey Hess
e702cb94ff
add compute remote uuid to compute state url
Otherwise, two different compute remotes that happen to take the same
input would use the same compute state url. Which seems wrong.
2025-02-25 18:44:40 -04:00
Joey Hess
71e92a509a
use compute program REPRODUCIBLE by default 2025-02-25 17:10:41 -04:00
Joey Hess
233a6954b9
ingest when --unreproducible is used without --fast 2025-02-25 17:04:19 -04:00
Joey Hess
16f529c05f
addcomputed --fast and --unreproducible working
For these, use VURL and URL keys, with an "annex-compute:" URI prefix.

These URL keys will look something like this:

	URL--annex-compute&cbar4,63pconvert,3-f4d3d72cf3f16ac9c3e9a8012bde4462

Generally it's too long so most of it gets md5summed. It's a little
ugly, but it's what fell out of the existing URL key generation
machinery. I did consider special casing to eg
"URL--annex-compute&c4d3d72cf3f16ac9c3e9a8012bde4462". But it seems at
least possibly useful that the name of the file that was computed is
visible and perhaps one or two words of the git-annex compute command
parameters.

Note that two different output files from the same computation will get
the same URL key. And these keys should remain stable.
2025-02-25 16:43:15 -04:00
Joey Hess
a154e91513
add git-annex addcomputed
Working pretty well. Mostly. But:

* Does not yet support inputs that are non-annexed files checked into git
* --fast is currently broken (will need something like VURL keys)
* --unreproducible still uses a checksumming backend, so drop and get
  again will likely fail (needs probably to use an URL key or something
  like one)

The compute special remote seems to work pretty well too. Eg,
getting from it works, and dropping content that is present in it works.
2025-02-25 15:50:08 -04:00
Joey Hess
4f1eea9061
remove unused adjustedBranchRefresh associated file parameter 2025-02-21 14:51:02 -04:00
Joey Hess
f8bb9a8734
replace removeLink with removeFile
same reasoning as in commit 5cc8d9d03b
2025-02-11 13:41:26 -04:00
Joey Hess
3bbabd6778
replace R.doesPathExist with doesPathExist
Equivilant, just avoids some ugliness.
2025-02-11 12:46:54 -04:00
Joey Hess
2ff716be30
OsPath build flag no longer depends on filepath-bytestring
However, filepath-bytestring is still in Setup-Depends.
That's because Utility.OsPath uses it when not built with OsPath.
It would be maybe possible to make Utility.OsPath fall back to using
filepath, and eliminate that dependency too, but it would mean either
wrapping all of System.FilePath's functions, or using `type OsPath = FilePath`

Annex.Import uses ifdefs to avoid converting back to FilePath when not
on windows. On windows it's a bit slower due to that conversion.
Utility.Path.Windows.convertToWindowsNativeNamespace got a bit
slower too, but not really worth optimising I think.

Note that importing Utility.FileSystemEncoding at the same time as
System.Posix.ByteString will result in conflicting definitions for
RawFilePath. filepath-bytestring avoids that by importing RawFilePath
from System.Posix.ByteString, but that's not possible in
Utility.FileSystemEncoding, since Setup-Depends does not include unix.
This turned out not to affect any code in git-annex though.

Sponsored-by: Leon Schuermann
2025-02-10 16:39:55 -04:00
Joey Hess
c730d00b6e
more OsPath conversion (749/749)
Builds with and without OsPath build flag.

Unfortunately, the test suite fails.

Sponsored-by: unqueued on Patreon
2025-02-10 14:59:20 -04:00
Joey Hess
2d224e0d28
more OsPath conversion (658/749)
At this point the test suite builds, and mostly the assistant is left.

Sponsored-by: unqueued
2025-02-08 15:27:44 -04:00
Joey Hess
5eef09a3cc
more OsPath conversion (650/749)
Sponsored-by: Nicholas Golder-Manning
2025-02-07 17:03:31 -04:00
Joey Hess
c74c75b352
more OsPath conversion (639/749)
Sponsored-by: k0ld
2025-02-07 16:07:05 -04:00
Joey Hess
a5d48edd94
more OsPath conversion (602/749)
Sponsored-by: Brock Spratlen
2025-02-07 14:46:11 -04:00
Joey Hess
2d1db7986c
more OsPath conversion (572/749)
Sponsored-by: Jack Hill
2025-02-06 16:18:52 -04:00
Joey Hess
0811531b59
more OsPath conversion (542/749)
Sponsored-by: Luke T. Shumaker
2025-02-06 11:38:14 -04:00
Joey Hess
77e9781ae2
parsePOSIXTime ByteString conversion
Some easy (though tiny) speed wins.

Sponsored-by: Luke T. Shumaker on Patreon
2025-01-22 16:42:09 -04:00
Joey Hess
6e27b0d4d1
convert from readFileStrict
This removes that function, using file-io readFile' instead.

Had to deal with newline conversion, which readFileStrict does on
Windows. In a few cases, that was pretty ugly to deal with.

Sponsored-by: Kevin Mueller
2025-01-22 16:20:36 -04:00
Joey Hess
9b79f0f43d
use file-io for readFile/writeFile/appendFile on ByteStrings
These are all straightforward, and easy small performance wins.

Sponsored-by: Nicholas Golder-Manning
2025-01-22 14:30:25 -04:00
Joey Hess
90cd3aad37
RawFilePath conversion for replaceFile
Sponsored-by: Joshua Antonishen
2025-01-22 13:37:26 -04:00
Joey Hess
f17ec601c4
optimize truncateFilePath
Often the filepath will be all ascii, or mostly so, and this
optimisation makes a file that has an ascii suffix of sufficient length
be roundtrip converted between String and ByteString only once, rather
than once per character.

Sponsored-by: Graham Spencer
2025-01-22 13:09:15 -04:00
Joey Hess
793ddecd4b
use openTempFile from file-io
And follow-on changes.

Note that relatedTemplate was changed to operate on a RawFilePath, and
so when it counts the length, it is now the number of bytes, not the
number of code points. This will just make it truncate shorter strings
in some cases, the truncation is still unicode aware.

When not building with the OsPath flag, toOsPath . fromRawFilePath and
fromRawFilePath . fromOsPath do extra conversions back and forth between
String and ByteString. That overhead could be avoided, but that's the
non-optimised build mode, so didn't bother.

Sponsored-by: unqueued
2025-01-22 11:41:43 -04:00
Joey Hess
1faa3af9cd
add file-io to build-depends when building with OsPath flag
Partly converted code to use functions from it, though more remain
unconverted. Most of withFile and openFile now use it.
2025-01-21 14:26:04 -04:00
Joey Hess
1ceece3108
RawFilePath conversion of System.Directory
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.

Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.

Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
2025-01-20 19:17:33 -04:00
Joey Hess
9e4314de76
relax annex-tracking-branch to allow "/"
Allow setting remote.foo.annex-tracking-branch to a branch name that
contains "/", as long as it's not a remote tracking branch.
2025-01-20 11:31:18 -04:00
Joey Hess
5df1b2b36e
configs annex.post-update-command and annex.pre-commit-command
Added git configs annex.post-update-command and annex.pre-commit-command
that correspond to the git-annex hook scripts post-update-annex and
pre-commit-annex.

Note that the hook files take precience over the git config, since the git
config can includ global config which should be overridden by local config.

These new git configs are probably not super useful. Especially the
pre-commit-annex hook is there to install scripts to instead of the
pre-commit hook, since git-annex installs that hook itself. So why would
someone want to use a git config for that? Only reason I can think of would
be in a global git config. Or possibly because it's easier to set a git
config than write a hook script, on an OS like Windows.

The real reason I'm adding these is as groundwork for making other
annex.*-command git configs also be available as hook scripts. I want
to avoid having some things available as only git hooks and others as
both gitconfigs and git hooks. (It seems that some annex.*-command configs
don't translate to git hooks though.)

In the man page, moved documentation of the hooks to be next to the
documentation of the git configs. This is to avoid repitition.
2025-01-10 13:27:51 -04:00
Joey Hess
0815c82bb1
log: Support --key, as well as --branch and --unused
--all remains a special case, since it is more efficient and displays in a
nicer order.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-01-03 15:45:42 -04:00
Joey Hess
da5e195597
remove i386ancient and need at least debian stable to build
* Removed the i386ancient standalone tarball build for linux, which
  was increasingly unable to support new git-annex features.
* Removed support for building with ghc older than 9.0.2,
  and with older versions of haskell libraries than are in current Debian
  stable.
* stack.yaml: Update to lts-23.2.

Note that i386ancient was targeting linux 2.6.32, which has been EOL for
over 9 years now. Any old system still using such a kernel is certainly highly
insecure. And I suspect i386ancient had its own insecurities due to haskell
libraries and C libraries not having been updated.
2025-01-01 14:15:55 -04:00
Joey Hess
29b3c7c660
annex.addunlocked support for tree imports
Honor annex.addunlocked configuration when importing a tree from a special
remote.

Note, in a --no-content import, the object file will not be populated
(usually) and so expressions that match on mime type will not match. Tested
this and it works ok, the file just ends up locked. Updated docs for the
mime expressions to mention that they can't match when the file is present

Note that in Command.Sync.pullThirdPartyPopulated, recordImportTree is
called without a AddUnlockedMatcher. Since the tree generated here is not
exposed to the user and does not contain usual filenames, there is no need
of the overhead of checking it.
2024-12-19 11:43:51 -04:00
Joey Hess
7d8558548b
empty preferred content
* Document that settting preferred content to "" is the same as the
  default unset behavior.
* sync: Avoid misleading warning about future preferred content
  transition when preferred content is set to "".
2024-12-13 13:26:48 -04:00
Joey Hess
4c785c338a
p2phttp: notice when new repositories are added to --directory
When a uuid is not known, rescan for new repositories. Easy.

When a repository is removed, it will also get removed from the server
state on the next scan. But until a new uuid is seen, there will not be
a scan. This leaves the server trying to serve a uuid whose repository
is gone. That seems buggy. While getting just fails, dropping fails the
first time, but seems to leave the server in an unusable state, so the
next drop attempt hangs. The server is still able to serve other uuids,
only the one whose repository was removed has that problem.
2024-11-21 15:09:12 -04:00
Joey Hess
758ea89c74
skip over repositories in --directory that do not have annex.uuid set 2024-11-21 14:18:18 -04:00
Joey Hess
3c18398d5a
p2phttp support --jobs with --directory
--jobs is usually an Annex option setter, but --directory runs in IO, so
would not have that available. So instead moved the option parser into
the command's Options.
2024-11-21 14:15:14 -04:00