diff --git a/Build/OSXMkLibs.hs b/Build/OSXMkLibs.hs index 5df85a757b..de5f4335d9 100644 --- a/Build/OSXMkLibs.hs +++ b/Build/OSXMkLibs.hs @@ -88,7 +88,8 @@ installLibs appbase installedbins replacement_libs libmap = do -} otool :: FilePath -> M.Map FilePath FilePath -> [(FilePath, FilePath)] -> LibMap -> IO ([FilePath], [(FilePath, FilePath)], LibMap) otool appbase installedbins replacement_libs libmap = do - files <- filterM doesFileExist =<< dirContentsRecursive appbase + files <- filterM doesFileExist + =<< (map fromRawFilePath <$> dirContentsRecursive (toRawFilePath appbase)) process [] files replacement_libs libmap where want s = diff --git a/doc/design/compute_special_remote_interface.mdwn b/doc/design/compute_special_remote_interface.mdwn new file mode 100644 index 0000000000..63bc253493 --- /dev/null +++ b/doc/design/compute_special_remote_interface.mdwn @@ -0,0 +1,80 @@ +**draft** + +The [[special_remotes/compute]] special remote uses this interface to run +compute programs. + +When an compute special remote is initremoted, a program is specified: + + git-annex initremote myremote type=compute program=foo + +That causes `git-annex-compute-foo` to be run to get files from that +compute special remote. + +The environment variable `ANNEX_COMPUTE_KEY` is the key that the program +is requested to compute. + +The program is run in a temporary directory, which will be cleaned up after it +exits. When it generates the content of a key, it should write it to a file +with the same name as the key, in that directory. Then it should +output the key in a line to stdout. + +While usually this will be the requested key, the program can output any +number of other keys as well, all of which will be stored in the git-annex +repository when getting files from the compute special remote. When a +computation generates several files, this allows running it a single time +to get them all. + +The program is passed environment variables to provide inputs to the +computation. These are all prefixed with `"ANNEX_COMPUTE_"`. + +The names are taken from the `git-annex addcomputed` command that was used to +add a computed file to the repository. + +For example, this command: + + git-annex addcomputed file.gen --to foo \ + --input raw=file.raw --value passes=10 + +Will result in this environment: + + ANNEX_COMPUTE_KEY=SHA256--... + ANNEX_COMPUTE_raw=file.in + ANNEX_COMPUTE_INPUT_raw=/path/.git/annex/objects/.. + ANNEX_COMPUTE_passes=10 + +For security, the program should avoid exposing values from `ANNEX_COMPUTE_*` +variables to the shell unprotected, or otherwise executing them. + +The program will also inherit other environment variables +that were set when git-annex was run, like PATH. + +Anything that the program outputs to stderr will be displayed to the user. +This stderr should be used for error messages, and possibly computation +output, but not for progress displays, since git-annex has its own progress +displays. + +If possible, the program should write the content of the key it is +generating directly to the file, rather than writing to somewhere else and +renaming it at the end. If git-annex sees that the file corresponding to +the key it requested be computed is growing, it will use the file size when +displaying progress to the user. + +Alternatively, if the program outputs a number on a line to stdout, this is +taken to be the number of bytes of the requested key that have been computed +so far. Or, the program can output a percentage eg "50%" on a line to stdout +to indicate what percent of the computation has been performed so far. + +If the program exits nonzero, nothing it computed will be stored in the +git-annex repository. + +An example `git-annex-compute-foo` shell script follows: + + #!/bin/sh + set -e + if [ -z "$ANNEX_COMPUTE_passes" || -z "$ANNEX_COMPUTE_INPUT_raw" ]; then + echo "Missing expected inputs" >&2 + exit 1 + fi + frobnicate --passes="$ANNEX_COMPUTE_passes" \ + <"$ANNEX_COMPUTE_INPUT_raw" >"$ANNEX_COMPUTE_KEY" + echo "$ANNEX_COMPUTE_KEY" diff --git a/doc/forum/Relocating_annex_directory/comment_3_3bf363a066a3b5db93d592bbc6566974._comment b/doc/forum/Relocating_annex_directory/comment_3_3bf363a066a3b5db93d592bbc6566974._comment new file mode 100644 index 0000000000..3baaf6518c --- /dev/null +++ b/doc/forum/Relocating_annex_directory/comment_3_3bf363a066a3b5db93d592bbc6566974._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="anarcat" + avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7" + subject="similar topic" + date="2025-02-14T17:51:29Z" + content=""" + see also [[moving_annex_across_filesystems]] +"""]] diff --git a/doc/forum/best_way_to_move_a_git_annex_repo_trought_file_system/comment_4_9ddedf0f603b0aaefd716e7f80296022._comment b/doc/forum/best_way_to_move_a_git_annex_repo_trought_file_system/comment_4_9ddedf0f603b0aaefd716e7f80296022._comment new file mode 100644 index 0000000000..ae6ee1d0c4 --- /dev/null +++ b/doc/forum/best_way_to_move_a_git_annex_repo_trought_file_system/comment_4_9ddedf0f603b0aaefd716e7f80296022._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="anarcat" + avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7" + subject="similar topic" + date="2025-02-14T17:47:01Z" + content=""" +see also [[forum/moving_annex_across_filesystems]] +"""]] diff --git a/doc/forum/moving_annex_across_filesystems.mdwn b/doc/forum/moving_annex_across_filesystems.mdwn index 28a6852f7d..2f5594a389 100644 --- a/doc/forum/moving_annex_across_filesystems.mdwn +++ b/doc/forum/moving_annex_across_filesystems.mdwn @@ -24,3 +24,20 @@ I believe that's because rm won't fail to remove files if they are readonly when Anyways - what's the proper way of doing this? I know I could `git clone` the repository and `git get` everything, but that would create another repository with a new UUID. That's duplication I do not want. Thanks for the advice! -- [[anarcat]] + +Update, years later... The problem with cloning is that it pollutes the history of the git repository, with all that location information duplicated for a repo that is effectively, immediately forgotten. + +That said, it's quite nice to use git itself to move the repository, as it provides a more reliable way to do this: + + cd /srv + git clone ~/Photos + cd Photos + git annex get + +As long as you don't `git-annex-sync`, you don't send the UUID back and I guess it's *possible* to `git-annex-reinit` to recycle the UUID, but I'm not sure it helps with the extra metadata created. + +In some cases, however, this is actually what you want: you *are* creating a new repository, even if you're removing the old one. I've found that the actual, safest way to do those transfers is to clone, as sometimes `mv(1)` can fail halfway and then you have an inconsistent copy and you need to restart from scratch. + +Furthermore, while it has been stated elsewhere ([[forum/best_way_to_move_a_git_annex_repo_trought_file_system]], [[forum/Relocating_annex_directory]]) that a git-annex "repository is just a collection of files in a directory", I would argue it's not *quite* true. A git-annex repository is quite peculiar: it has hidden files, readonly files and directories, and can have symbolic links. And while those might seem perfectly normal to a seasoned UNIX programmer or system administrator, they trigger a bunch of special edge cases that might confuse a lot of people (like broken links, permission denied errors when removing folders, etc). + +The idea that git-annex is "just a normal folder" is nice in theory, but it breaks down in some edge cases, and I think it's important for people to be aware of that, especially when doing special operations like this. diff --git a/doc/todo/compute_special_remote/comment_15_35bc3e4093591f0551433c3a26abd333._comment b/doc/todo/compute_special_remote/comment_15_35bc3e4093591f0551433c3a26abd333._comment new file mode 100644 index 0000000000..8d6f8cad53 --- /dev/null +++ b/doc/todo/compute_special_remote/comment_15_35bc3e4093591f0551433c3a26abd333._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: comment 13""" + date="2025-02-13T16:36:45Z" + content=""" +@m.risse earlier you said that it would be bad to + +> Silently use the old version of "data.grib", creating a mismatch between +> "data.nc" and "data.grib" + +That's what I was getting at when I said: + +> But if you already have data.nc file present in a repository, it does not +> get updated immediately when you update the source "data.grib" file. + +So just using files from HEAD for the computation is not sufficient to +avoid this kind of mismatch. The user will need some workflow to deal with +it. + +Eg, they could recompute data.nc whenever data.grib is updated, and so make a +commit that updates both files together. But if they're doing that, why does +the computation need to use files from HEAD? Recomputing data.nc could just as +well pin the new key of data.grib. +"""]] diff --git a/doc/todo/compute_special_remote/comment_16_54204f341ab2c1203d7092cca8fb6b1d._comment b/doc/todo/compute_special_remote/comment_16_54204f341ab2c1203d7092cca8fb6b1d._comment new file mode 100644 index 0000000000..3e75ed18e7 --- /dev/null +++ b/doc/todo/compute_special_remote/comment_16_54204f341ab2c1203d7092cca8fb6b1d._comment @@ -0,0 +1,34 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: crossing repository boundaries""" + date="2025-02-13T17:01:52Z" + content=""" +It could be argued that git-annex should recurse into submodules. +Oddly, I don't remember that anyone has ever tried to make that argument. +If they did it was a long time ago. It may be that datalad has relieved +enough of the pressure in that area that it's not bothering many people. + +Anyway, I wouldn't want to tie compute special remotes to changing +git-annex in that way, but I also wouldn't want to rule out adding +useful stuff to git-annex just because it breaches the submodule boundary +in a way that's new to git-annex. + +Thinking about a command like this: + + git-annex addcomputed foo --to ffmpeg-cut \ + --input source=submodule/input.mov \ + --value starttime=15:00 \ + --value endtime=30:00 + +That would need to look inside the submodule to find the input key. + +When getting the key later, it can't rely on the tree still containing the +same submodules at the same locations. `git mv submodule foo` would break +the computation. + +I think that can be dealt with by having it fall back to checking location +logs of all submodules, to find the submodule that knows about a key. + +Deleting a submodule would still break the computation, and that seems +difficult to avoid. Seems acceptable. +"""]] diff --git a/doc/todo/compute_special_remote/comment_17_8bc2bd7f93ac98d55b69ec1e4e6fa487._comment b/doc/todo/compute_special_remote/comment_17_8bc2bd7f93ac98d55b69ec1e4e6fa487._comment new file mode 100644 index 0000000000..5735dfa1c6 --- /dev/null +++ b/doc/todo/compute_special_remote/comment_17_8bc2bd7f93ac98d55b69ec1e4e6fa487._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 17""" + date="2025-02-13T20:10:52Z" + content=""" +I've written up a draft interface for programs used by a compute special +remote: [[design/compute_special_remote_interface]] +"""]]