Merge branch 'master' into ospath
This commit is contained in:
		
				commit
				
					
						e8b00faea8
					
				
			
		
					 8 changed files with 181 additions and 1 deletions
				
			
		| 
						 | 
				
			
			@ -88,7 +88,8 @@ installLibs appbase installedbins replacement_libs libmap = do
 | 
			
		|||
 -}
 | 
			
		||||
otool :: FilePath -> M.Map FilePath FilePath -> [(FilePath, FilePath)] -> LibMap -> IO ([FilePath], [(FilePath, FilePath)], LibMap)
 | 
			
		||||
otool appbase installedbins replacement_libs libmap = do
 | 
			
		||||
	files <- filterM doesFileExist =<< dirContentsRecursive appbase
 | 
			
		||||
	files <- filterM doesFileExist 
 | 
			
		||||
		=<< (map fromRawFilePath <$> dirContentsRecursive (toRawFilePath appbase))
 | 
			
		||||
	process [] files replacement_libs libmap
 | 
			
		||||
  where
 | 
			
		||||
	want s = 
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
							
								
								
									
										80
									
								
								doc/design/compute_special_remote_interface.mdwn
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										80
									
								
								doc/design/compute_special_remote_interface.mdwn
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,80 @@
 | 
			
		|||
**draft**
 | 
			
		||||
 | 
			
		||||
The [[special_remotes/compute]] special remote uses this interface to run
 | 
			
		||||
compute programs.
 | 
			
		||||
 | 
			
		||||
When an compute special remote is initremoted, a program is specified:
 | 
			
		||||
 | 
			
		||||
    git-annex initremote myremote type=compute program=foo
 | 
			
		||||
 | 
			
		||||
That causes `git-annex-compute-foo` to be run to get files from that
 | 
			
		||||
compute special remote.
 | 
			
		||||
 | 
			
		||||
The environment variable `ANNEX_COMPUTE_KEY` is the key that the program
 | 
			
		||||
is requested to compute.
 | 
			
		||||
 | 
			
		||||
The program is run in a temporary directory, which will be cleaned up after it
 | 
			
		||||
exits. When it generates the content of a key, it should write it to a file
 | 
			
		||||
with the same name as the key, in that directory. Then it should
 | 
			
		||||
output the key in a line to stdout.
 | 
			
		||||
 | 
			
		||||
While usually this will be the requested key, the program can output any
 | 
			
		||||
number of other keys as well, all of which will be stored in the git-annex
 | 
			
		||||
repository when getting files from the compute special remote. When a
 | 
			
		||||
computation generates several files, this allows running it a single time
 | 
			
		||||
to get them all.
 | 
			
		||||
 | 
			
		||||
The program is passed environment variables to provide inputs to the
 | 
			
		||||
computation. These are all prefixed with `"ANNEX_COMPUTE_"`.
 | 
			
		||||
 | 
			
		||||
The names are taken from the `git-annex addcomputed` command that was used to
 | 
			
		||||
add a computed file to the repository.
 | 
			
		||||
 | 
			
		||||
For example, this command:
 | 
			
		||||
 | 
			
		||||
    git-annex addcomputed file.gen --to foo \
 | 
			
		||||
        --input raw=file.raw --value passes=10
 | 
			
		||||
 | 
			
		||||
Will result in this environment:
 | 
			
		||||
 | 
			
		||||
	ANNEX_COMPUTE_KEY=SHA256--...
 | 
			
		||||
    ANNEX_COMPUTE_raw=file.in
 | 
			
		||||
    ANNEX_COMPUTE_INPUT_raw=/path/.git/annex/objects/..
 | 
			
		||||
    ANNEX_COMPUTE_passes=10
 | 
			
		||||
 | 
			
		||||
For security, the program should avoid exposing values from `ANNEX_COMPUTE_*`
 | 
			
		||||
variables to the shell unprotected, or otherwise executing them.
 | 
			
		||||
 | 
			
		||||
The program will also inherit other environment variables
 | 
			
		||||
that were set when git-annex was run, like PATH.
 | 
			
		||||
 | 
			
		||||
Anything that the program outputs to stderr will be displayed to the user.
 | 
			
		||||
This stderr should be used for error messages, and possibly computation
 | 
			
		||||
output, but not for progress displays, since git-annex has its own progress
 | 
			
		||||
displays.
 | 
			
		||||
 | 
			
		||||
If possible, the program should write the content of the key it is
 | 
			
		||||
generating directly to the file, rather than writing to somewhere else and
 | 
			
		||||
renaming it at the end. If git-annex sees that the file corresponding to
 | 
			
		||||
the key it requested be computed is growing, it will use the file size when
 | 
			
		||||
displaying progress to the user.
 | 
			
		||||
 | 
			
		||||
Alternatively, if the program outputs a number on a line to stdout, this is
 | 
			
		||||
taken to be the number of bytes of the requested key that have been computed
 | 
			
		||||
so far. Or, the program can output a percentage eg "50%" on a line to stdout
 | 
			
		||||
to indicate what percent of the computation has been performed so far.
 | 
			
		||||
 | 
			
		||||
If the program exits nonzero, nothing it computed will be stored in the 
 | 
			
		||||
git-annex repository.
 | 
			
		||||
 | 
			
		||||
An example `git-annex-compute-foo` shell script follows:
 | 
			
		||||
 | 
			
		||||
    #!/bin/sh
 | 
			
		||||
	set -e
 | 
			
		||||
    if [ -z "$ANNEX_COMPUTE_passes" || -z "$ANNEX_COMPUTE_INPUT_raw" ]; then
 | 
			
		||||
        echo "Missing expected inputs" >&2
 | 
			
		||||
        exit 1
 | 
			
		||||
    fi
 | 
			
		||||
    frobnicate --passes="$ANNEX_COMPUTE_passes" \
 | 
			
		||||
		<"$ANNEX_COMPUTE_INPUT_raw" >"$ANNEX_COMPUTE_KEY"
 | 
			
		||||
    echo "$ANNEX_COMPUTE_KEY"
 | 
			
		||||
| 
						 | 
				
			
			@ -0,0 +1,8 @@
 | 
			
		|||
[[!comment format=mdwn
 | 
			
		||||
 username="anarcat"
 | 
			
		||||
 avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
 | 
			
		||||
 subject="similar topic"
 | 
			
		||||
 date="2025-02-14T17:51:29Z"
 | 
			
		||||
 content="""
 | 
			
		||||
 see also [[moving_annex_across_filesystems]]
 | 
			
		||||
"""]]
 | 
			
		||||
| 
						 | 
				
			
			@ -0,0 +1,8 @@
 | 
			
		|||
[[!comment format=mdwn
 | 
			
		||||
 username="anarcat"
 | 
			
		||||
 avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
 | 
			
		||||
 subject="similar topic"
 | 
			
		||||
 date="2025-02-14T17:47:01Z"
 | 
			
		||||
 content="""
 | 
			
		||||
see also [[forum/moving_annex_across_filesystems]]
 | 
			
		||||
"""]]
 | 
			
		||||
| 
						 | 
				
			
			@ -24,3 +24,20 @@ I believe that's because rm won't fail to remove files if they are readonly when
 | 
			
		|||
Anyways - what's the proper way of doing this? I know I could `git clone` the repository and `git get` everything, but that would create another repository with a new UUID. That's duplication I do not want.
 | 
			
		||||
 | 
			
		||||
Thanks for the advice! -- [[anarcat]]
 | 
			
		||||
 | 
			
		||||
Update, years later...  The problem with cloning is that it pollutes the history of the git repository, with all that location information duplicated for a repo that is effectively, immediately forgotten.
 | 
			
		||||
 | 
			
		||||
That said, it's quite nice to use git itself to move the repository, as it provides a more reliable way to do this:
 | 
			
		||||
 | 
			
		||||
    cd /srv
 | 
			
		||||
    git clone ~/Photos
 | 
			
		||||
    cd Photos
 | 
			
		||||
    git annex get
 | 
			
		||||
 | 
			
		||||
As long as you don't `git-annex-sync`, you don't send the UUID back and I guess it's *possible* to `git-annex-reinit` to recycle the UUID, but I'm not sure it helps with the extra metadata created.
 | 
			
		||||
 | 
			
		||||
In some cases, however, this is actually what you want: you *are* creating a new repository, even if you're removing the old one. I've found that the actual, safest way to do those transfers is to clone, as sometimes `mv(1)` can fail halfway and then you have an inconsistent copy and you need to restart from scratch.
 | 
			
		||||
 | 
			
		||||
Furthermore, while it has been stated elsewhere ([[forum/best_way_to_move_a_git_annex_repo_trought_file_system]], [[forum/Relocating_annex_directory]]) that a git-annex "repository is just a collection of files in a directory", I would argue it's not *quite* true. A git-annex repository is quite peculiar: it has hidden files, readonly files and directories, and can have symbolic links. And while those might seem perfectly normal to a seasoned UNIX programmer or system administrator, they trigger a bunch of special edge cases that might confuse a lot of people (like broken links, permission denied errors when removing folders, etc).
 | 
			
		||||
 | 
			
		||||
The idea that git-annex is "just a normal folder" is nice in theory, but it breaks down in some edge cases, and I think it's important for people to be aware of that, especially when doing special operations like this.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -0,0 +1,24 @@
 | 
			
		|||
[[!comment format=mdwn
 | 
			
		||||
 username="joey"
 | 
			
		||||
 subject="""Re: comment 13"""
 | 
			
		||||
 date="2025-02-13T16:36:45Z"
 | 
			
		||||
 content="""
 | 
			
		||||
@m.risse earlier you said that it would be bad to 
 | 
			
		||||
	
 | 
			
		||||
> Silently use the old version of "data.grib", creating a mismatch between
 | 
			
		||||
> "data.nc" and "data.grib"
 | 
			
		||||
 | 
			
		||||
That's what I was getting at when I said:
 | 
			
		||||
 | 
			
		||||
> But if you already have data.nc file present in a repository, it does not
 | 
			
		||||
> get updated immediately when you update the source "data.grib" file.
 | 
			
		||||
 | 
			
		||||
So just using files from HEAD for the computation is not sufficient to
 | 
			
		||||
avoid this kind of mismatch. The user will need some workflow to deal with
 | 
			
		||||
it.
 | 
			
		||||
 | 
			
		||||
Eg, they could recompute data.nc whenever data.grib is updated, and so make a
 | 
			
		||||
commit that updates both files together. But if they're doing that, why does
 | 
			
		||||
the computation need to use files from HEAD? Recomputing data.nc could just as
 | 
			
		||||
well pin the new key of data.grib.
 | 
			
		||||
"""]]
 | 
			
		||||
| 
						 | 
				
			
			@ -0,0 +1,34 @@
 | 
			
		|||
[[!comment format=mdwn
 | 
			
		||||
 username="joey"
 | 
			
		||||
 subject="""Re: crossing repository boundaries"""
 | 
			
		||||
 date="2025-02-13T17:01:52Z"
 | 
			
		||||
 content="""
 | 
			
		||||
It could be argued that git-annex should recurse into submodules.
 | 
			
		||||
Oddly, I don't remember that anyone has ever tried to make that argument.
 | 
			
		||||
If they did it was a long time ago. It may be that datalad has relieved
 | 
			
		||||
enough of the pressure in that area that it's not bothering many people.
 | 
			
		||||
 | 
			
		||||
Anyway, I wouldn't want to tie compute special remotes to changing
 | 
			
		||||
git-annex in that way, but I also wouldn't want to rule out adding
 | 
			
		||||
useful stuff to git-annex just because it breaches the submodule boundary
 | 
			
		||||
in a way that's new to git-annex.
 | 
			
		||||
 | 
			
		||||
Thinking about a command like this:
 | 
			
		||||
 | 
			
		||||
	git-annex addcomputed foo --to ffmpeg-cut \
 | 
			
		||||
	    --input source=submodule/input.mov \
 | 
			
		||||
	    --value starttime=15:00 \
 | 
			
		||||
	    --value endtime=30:00
 | 
			
		||||
 | 
			
		||||
That would need to look inside the submodule to find the input key.
 | 
			
		||||
 | 
			
		||||
When getting the key later, it can't rely on the tree still containing the
 | 
			
		||||
same submodules at the same locations. `git mv submodule foo` would break
 | 
			
		||||
the computation. 
 | 
			
		||||
 | 
			
		||||
I think that can be dealt with by having it fall back to checking location
 | 
			
		||||
logs of all submodules, to find the submodule that knows about a key.
 | 
			
		||||
 | 
			
		||||
Deleting a submodule would still break the computation, and that seems
 | 
			
		||||
difficult to avoid. Seems acceptable.
 | 
			
		||||
"""]]
 | 
			
		||||
| 
						 | 
				
			
			@ -0,0 +1,8 @@
 | 
			
		|||
[[!comment format=mdwn
 | 
			
		||||
 username="joey"
 | 
			
		||||
 subject="""comment 17"""
 | 
			
		||||
 date="2025-02-13T20:10:52Z"
 | 
			
		||||
 content="""
 | 
			
		||||
I've written up a draft interface for programs used by a compute special
 | 
			
		||||
remote: [[design/compute_special_remote_interface]]
 | 
			
		||||
"""]]
 | 
			
		||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue