Merge branch 'master' into proxy
This commit is contained in:
		
				commit
				
					
						5aaa285083
					
				
			
		
					 7 changed files with 100 additions and 1 deletions
				
			
		
							
								
								
									
										35
									
								
								doc/bugs/git_annex_unannex_-_some_files_still_symlinked.mdwn
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										35
									
								
								doc/bugs/git_annex_unannex_-_some_files_still_symlinked.mdwn
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,35 @@ | |||
| ### Please describe the problem. | ||||
| 
 | ||||
| 1. Some files remain symlinked after aborted `git annex add` and completed `git annex unannex` | ||||
| 2. This files are present in``.git/annex/objects` but `git annex unused` does not find them. Running `git annex whereused --key=SHA256E...` runs empty. | ||||
| 
 | ||||
| To restore files and remove them from git-annex objects folder - need manual workarounds or hacks like adding file again with `git annex add` and trying to removing it again  | ||||
| 
 | ||||
| ### What steps will reproduce the problem? | ||||
| 
 | ||||
| 1. run `git annex add` and abort operation mid-way (this was on directory with large number of files ~3K and running with 12 jobs command switch) | ||||
| 2. run `git annex unannex` until done | ||||
| 3. find that some files that were added - were restored, and some still symlinked but are not tracked by git annex | ||||
| 
 | ||||
| 
 | ||||
| ### What version of git-annex are you using? On what operating system? | ||||
| 
 | ||||
| Debian Bookworm / git-annex version: 10.20240227-1 | ||||
| 
 | ||||
| ### Please provide any additional information below. | ||||
| 
 | ||||
| Similar report from another user here: | ||||
| https://git-annex.branchable.com/forum/File_still_symlinked_after_git_annex_unannex/ | ||||
| 
 | ||||
| [[!format sh """ | ||||
| # If you can, paste a complete transcript of the problem occurring here. | ||||
| # If the problem is with the git-annex assistant, paste in .git/annex/daemon.log | ||||
| 
 | ||||
| 
 | ||||
| # End of transcript or log. | ||||
| """]] | ||||
| 
 | ||||
| ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) | ||||
| 
 | ||||
| 
 | ||||
| Yes, using it extensively for a few years with terabytes of data | ||||
|  | @ -0,0 +1,22 @@ | |||
| [[!comment format=mdwn | ||||
|  username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" | ||||
|  nickname="ruslan" | ||||
|  avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" | ||||
|  subject="comment 1" | ||||
|  date="2024-06-05T17:34:32Z" | ||||
|  content=""" | ||||
| Solution with running `git annex add` is also described at the link below: | ||||
| 
 | ||||
| https://git-annex.branchable.com/forum/git_annex_add_crash_and_subsequent_recovery/#comment-4f5af644597a055624009c5bbb9aca3f | ||||
| 
 | ||||
| --- | ||||
| 
 | ||||
| So need to find files that are symlinks to git annex object folder and run `git annex add` / `git annex unused` - I can handle that with a script, though would be nice to have a built-in method | ||||
| 
 | ||||
| --- | ||||
| 
 | ||||
| Additional notes: | ||||
| 
 | ||||
| 1. There should be a way to find files that were added to git annex folder but are not tracked by git annex. Is this something that can be done with existing commands? | ||||
| 2. It's desirable to have a way to abort `git annex add` gracefully on long-running jobs. Is there a way to do it now? Looks like ctrl-c resulted in a broken state. Whould Ctrl-z work better? | ||||
| """]] | ||||
|  | @ -0,0 +1,3 @@ | |||
| As I understand - there is currently no way to track metadata for directories with `git annex metadata` (it only works for files). Is that indeed the case? | ||||
| 
 | ||||
| One workaround I'm looking at is to add a metadata placeholder file for directory metadata inside the directory. As I understand - each directory would need to have such file with some unique content (perhaps UUID), otherwise metadata between files for different directories will actually collide. Are there alternatives/better solutions for tracking datasets metadata (groups of files in a folder)? | ||||
|  | @ -0,0 +1,8 @@ | |||
| [[!comment format=mdwn | ||||
|  username="nobodyinperson" | ||||
|  avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" | ||||
|  subject="comment 1" | ||||
|  date="2024-06-06T09:09:03Z" | ||||
|  content=""" | ||||
| You are absolutely right. You might be interested in [DataLad](https://datalad.org), which provides a lot of convenience around git-annex, has the concept of datasets (git submodules) and also an extended approach to metadata. | ||||
| """]] | ||||
|  | @ -0,0 +1,15 @@ | |||
| [[!comment format=mdwn | ||||
|  username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" | ||||
|  nickname="ruslan" | ||||
|  avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" | ||||
|  subject="comment 2" | ||||
|  date="2024-06-06T11:23:34Z" | ||||
|  content=""" | ||||
| Thank you for the heads up!  | ||||
| 
 | ||||
| I've actually looked in to DataLad, and have been using git annex with submodules. | ||||
| 
 | ||||
| Problem I found with submodules is that they required a lot of additional steps as far as adding/moving/deleting/syncing them. A very manual process, with a lot of complexity and some rough edge cases. They also interfere with some of Git-Annex functionality like metadata driven views I believe. So I'm using submodules very sparingly, only when I really need them. | ||||
| 
 | ||||
| As far as DataLad - it looks like a mature and well supported project, would love to see more feedback/reviews on it. | ||||
| """]] | ||||
|  | @ -34,7 +34,13 @@ For June's work on [[design/passthrough_proxy]], implementation plan: | |||
| 1. Add `git-annex updateproxy` command and remote.name.annex-proxy | ||||
|    configuration. (done) | ||||
| 
 | ||||
| 2. Test implementation of remote instantiation for proxies. | ||||
| 2. Remote instantiation for proxies almost works, but fails at: | ||||
|    "git-annex: cannot determine uuid for origin-foo" | ||||
| 
 | ||||
|    getRepoUUID does not look at the Repo's UUID setting, but reads it | ||||
|    from git-config. It's not set there for a proxied remote. | ||||
| 
 | ||||
|    So: Add annex-uuid parsing to RemoteConfig. | ||||
| 
 | ||||
| 3. Implement proxying in git-annex-shell. | ||||
| 
 | ||||
|  |  | |||
|  | @ -0,0 +1,10 @@ | |||
| [[!comment format=mdwn | ||||
|  username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd" | ||||
|  nickname="ruslan" | ||||
|  avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9" | ||||
|  subject="comment 1" | ||||
|  date="2024-06-05T16:53:50Z" | ||||
|  content=""" | ||||
| Yes, limiting it to a single file would be sufficient for the use case I encountered, and keep it simple from the usage / user interface stand point IMHO | ||||
| Would look forward to this! | ||||
| """]] | ||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue
	
	 Joey Hess
				Joey Hess