Merge branch 'master' into ospath

2025-02-14 16:28:43 -04:00 · 2025-02-14 16:28:43 -04:00 · e8b00faea8
commit e8b00faea8
parent 406527570e 56d28eebdc
8 changed files with 181 additions and 1 deletions
--- a/Build/OSXMkLibs.hs
+++ b/Build/OSXMkLibs.hs
@ -88,7 +88,8 @@ installLibs appbase installedbins replacement_libs libmap = do
 -}
 otool :: FilePath -> M.Map FilePath FilePath -> [(FilePath, FilePath)] -> LibMap -> IO ([FilePath], [(FilePath, FilePath)], LibMap)
 otool appbase installedbins replacement_libs libmap = do
-	files <- filterM doesFileExist =<< dirContentsRecursive appbase
+	files <- filterM doesFileExist 
 		=<< (map fromRawFilePath <$> dirContentsRecursive (toRawFilePath appbase))
 	process [] files replacement_libs libmap
  where
 	want s = 
--- a/doc/design/compute_special_remote_interface.mdwn
+++ b/doc/design/compute_special_remote_interface.mdwn
@ -0,0 +1,80 @@
 **draft**
 The [[special_remotes/compute]] special remote uses this interface to run
 compute programs.
 When an compute special remote is initremoted, a program is specified:
    git-annex initremote myremote type=compute program=foo
 That causes `git-annex-compute-foo` to be run to get files from that
 compute special remote.
 The environment variable `ANNEX_COMPUTE_KEY` is the key that the program
 is requested to compute.
 The program is run in a temporary directory, which will be cleaned up after it
 exits. When it generates the content of a key, it should write it to a file
 with the same name as the key, in that directory. Then it should
 output the key in a line to stdout.
 While usually this will be the requested key, the program can output any
 number of other keys as well, all of which will be stored in the git-annex
 repository when getting files from the compute special remote. When a
 computation generates several files, this allows running it a single time
 to get them all.
 The program is passed environment variables to provide inputs to the
 computation. These are all prefixed with `"ANNEX_COMPUTE_"`.
 The names are taken from the `git-annex addcomputed` command that was used to
 add a computed file to the repository.
 For example, this command:
    git-annex addcomputed file.gen --to foo \
        --input raw=file.raw --value passes=10
 Will result in this environment:
 	ANNEX_COMPUTE_KEY=SHA256--...
    ANNEX_COMPUTE_raw=file.in
    ANNEX_COMPUTE_INPUT_raw=/path/.git/annex/objects/..
    ANNEX_COMPUTE_passes=10
 For security, the program should avoid exposing values from `ANNEX_COMPUTE_*`
 variables to the shell unprotected, or otherwise executing them.
 The program will also inherit other environment variables
 that were set when git-annex was run, like PATH.
 Anything that the program outputs to stderr will be displayed to the user.
 This stderr should be used for error messages, and possibly computation
 output, but not for progress displays, since git-annex has its own progress
 displays.
 If possible, the program should write the content of the key it is
 generating directly to the file, rather than writing to somewhere else and
 renaming it at the end. If git-annex sees that the file corresponding to
 the key it requested be computed is growing, it will use the file size when
 displaying progress to the user.
 Alternatively, if the program outputs a number on a line to stdout, this is
 taken to be the number of bytes of the requested key that have been computed
 so far. Or, the program can output a percentage eg "50%" on a line to stdout
 to indicate what percent of the computation has been performed so far.
 If the program exits nonzero, nothing it computed will be stored in the 
 git-annex repository.
 An example `git-annex-compute-foo` shell script follows:
    #!/bin/sh
 	set -e
    if [ -z "$ANNEX_COMPUTE_passes" || -z "$ANNEX_COMPUTE_INPUT_raw" ]; then
        echo "Missing expected inputs" >&2
        exit 1
    fi
    frobnicate --passes="$ANNEX_COMPUTE_passes" \
 		<"$ANNEX_COMPUTE_INPUT_raw" >"$ANNEX_COMPUTE_KEY"
    echo "$ANNEX_COMPUTE_KEY"
--- a/doc/forum/Relocating_annex_directory/comment_3_3bf363a066a3b5db93d592bbc6566974._comment
+++ b/doc/forum/Relocating_annex_directory/comment_3_3bf363a066a3b5db93d592bbc6566974._comment
@ -0,0 +1,8 @@
 [[!comment format=mdwn
 username="anarcat"
 avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
 subject="similar topic"
 date="2025-02-14T17:51:29Z"
 content="""
 see also [[moving_annex_across_filesystems]]
 """]]
--- a/doc/forum/best_way_to_move_a_git_annex_repo_trought_file_system/comment_4_9ddedf0f603b0aaefd716e7f80296022._comment
+++ b/doc/forum/best_way_to_move_a_git_annex_repo_trought_file_system/comment_4_9ddedf0f603b0aaefd716e7f80296022._comment
@ -0,0 +1,8 @@
 [[!comment format=mdwn
 username="anarcat"
 avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
 subject="similar topic"
 date="2025-02-14T17:47:01Z"
 content="""
 see also [[forum/moving_annex_across_filesystems]]
 """]]
--- a/doc/forum/moving_annex_across_filesystems.mdwn
+++ b/doc/forum/moving_annex_across_filesystems.mdwn
@ -24,3 +24,20 @@ I believe that's because rm won't fail to remove files if they are readonly when
 Anyways - what's the proper way of doing this? I know I could `git clone` the repository and `git get` everything, but that would create another repository with a new UUID. That's duplication I do not want.
 Thanks for the advice! -- [[anarcat]]
 Update, years later...  The problem with cloning is that it pollutes the history of the git repository, with all that location information duplicated for a repo that is effectively, immediately forgotten.
 That said, it's quite nice to use git itself to move the repository, as it provides a more reliable way to do this:
    cd /srv
    git clone ~/Photos
    cd Photos
    git annex get
 As long as you don't `git-annex-sync`, you don't send the UUID back and I guess it's *possible* to `git-annex-reinit` to recycle the UUID, but I'm not sure it helps with the extra metadata created.
 In some cases, however, this is actually what you want: you *are* creating a new repository, even if you're removing the old one. I've found that the actual, safest way to do those transfers is to clone, as sometimes `mv(1)` can fail halfway and then you have an inconsistent copy and you need to restart from scratch.
 Furthermore, while it has been stated elsewhere ([[forum/best_way_to_move_a_git_annex_repo_trought_file_system]], [[forum/Relocating_annex_directory]]) that a git-annex "repository is just a collection of files in a directory", I would argue it's not *quite* true. A git-annex repository is quite peculiar: it has hidden files, readonly files and directories, and can have symbolic links. And while those might seem perfectly normal to a seasoned UNIX programmer or system administrator, they trigger a bunch of special edge cases that might confuse a lot of people (like broken links, permission denied errors when removing folders, etc).
 The idea that git-annex is "just a normal folder" is nice in theory, but it breaks down in some edge cases, and I think it's important for people to be aware of that, especially when doing special operations like this.
--- a/doc/todo/compute_special_remote/comment_15_35bc3e4093591f0551433c3a26abd333._comment
+++ b/doc/todo/compute_special_remote/comment_15_35bc3e4093591f0551433c3a26abd333._comment
@ -0,0 +1,24 @@
 [[!comment format=mdwn
 username="joey"
 subject="""Re: comment 13"""
 date="2025-02-13T16:36:45Z"
 content="""
@m.risse earlier you said that it would be bad to 
 > Silently use the old version of "data.grib", creating a mismatch between
 > "data.nc" and "data.grib"
 That's what I was getting at when I said:
 > But if you already have data.nc file present in a repository, it does not
 > get updated immediately when you update the source "data.grib" file.
 So just using files from HEAD for the computation is not sufficient to
 avoid this kind of mismatch. The user will need some workflow to deal with
 it.
 Eg, they could recompute data.nc whenever data.grib is updated, and so make a
 commit that updates both files together. But if they're doing that, why does
 the computation need to use files from HEAD? Recomputing data.nc could just as
 well pin the new key of data.grib.
 """]]
--- a/doc/todo/compute_special_remote/comment_16_54204f341ab2c1203d7092cca8fb6b1d._comment
+++ b/doc/todo/compute_special_remote/comment_16_54204f341ab2c1203d7092cca8fb6b1d._comment
@ -0,0 +1,34 @@
 [[!comment format=mdwn
 username="joey"
 subject="""Re: crossing repository boundaries"""
 date="2025-02-13T17:01:52Z"
 content="""
 It could be argued that git-annex should recurse into submodules.
 Oddly, I don't remember that anyone has ever tried to make that argument.
 If they did it was a long time ago. It may be that datalad has relieved
 enough of the pressure in that area that it's not bothering many people.
 Anyway, I wouldn't want to tie compute special remotes to changing
 git-annex in that way, but I also wouldn't want to rule out adding
 useful stuff to git-annex just because it breaches the submodule boundary
 in a way that's new to git-annex.
 Thinking about a command like this:
 	git-annex addcomputed foo --to ffmpeg-cut \
 	    --input source=submodule/input.mov \
 	    --value starttime=15:00 \
 	    --value endtime=30:00
 That would need to look inside the submodule to find the input key.
 When getting the key later, it can't rely on the tree still containing the
 same submodules at the same locations. `git mv submodule foo` would break
 the computation. 
 I think that can be dealt with by having it fall back to checking location
 logs of all submodules, to find the submodule that knows about a key.
 Deleting a submodule would still break the computation, and that seems
 difficult to avoid. Seems acceptable.
 """]]
--- a/doc/todo/compute_special_remote/comment_17_8bc2bd7f93ac98d55b69ec1e4e6fa487._comment
+++ b/doc/todo/compute_special_remote/comment_17_8bc2bd7f93ac98d55b69ec1e4e6fa487._comment
@ -0,0 +1,8 @@
 [[!comment format=mdwn
 username="joey"
 subject="""comment 17"""
 date="2025-02-13T20:10:52Z"
 content="""
 I've written up a draft interface for programs used by a compute special
 remote: [[design/compute_special_remote_interface]]
 """]]