Merge branch 'master' into proxy

This commit is contained in:
Joey Hess 2024-06-07 10:43:13 -04:00
commit 5aaa285083
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 100 additions and 1 deletions

View file

@ -0,0 +1,35 @@
### Please describe the problem.
1. Some files remain symlinked after aborted `git annex add` and completed `git annex unannex`
2. This files are present in``.git/annex/objects` but `git annex unused` does not find them. Running `git annex whereused --key=SHA256E...` runs empty.
To restore files and remove them from git-annex objects folder - need manual workarounds or hacks like adding file again with `git annex add` and trying to removing it again
### What steps will reproduce the problem?
1. run `git annex add` and abort operation mid-way (this was on directory with large number of files ~3K and running with 12 jobs command switch)
2. run `git annex unannex` until done
3. find that some files that were added - were restored, and some still symlinked but are not tracked by git annex
### What version of git-annex are you using? On what operating system?
Debian Bookworm / git-annex version: 10.20240227-1
### Please provide any additional information below.
Similar report from another user here:
https://git-annex.branchable.com/forum/File_still_symlinked_after_git_annex_unannex/
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes, using it extensively for a few years with terabytes of data

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd"
nickname="ruslan"
avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9"
subject="comment 1"
date="2024-06-05T17:34:32Z"
content="""
Solution with running `git annex add` is also described at the link below:
https://git-annex.branchable.com/forum/git_annex_add_crash_and_subsequent_recovery/#comment-4f5af644597a055624009c5bbb9aca3f
---
So need to find files that are symlinks to git annex object folder and run `git annex add` / `git annex unused` - I can handle that with a script, though would be nice to have a built-in method
---
Additional notes:
1. There should be a way to find files that were added to git annex folder but are not tracked by git annex. Is this something that can be done with existing commands?
2. It's desirable to have a way to abort `git annex add` gracefully on long-running jobs. Is there a way to do it now? Looks like ctrl-c resulted in a broken state. Whould Ctrl-z work better?
"""]]

View file

@ -0,0 +1,3 @@
As I understand - there is currently no way to track metadata for directories with `git annex metadata` (it only works for files). Is that indeed the case?
One workaround I'm looking at is to add a metadata placeholder file for directory metadata inside the directory. As I understand - each directory would need to have such file with some unique content (perhaps UUID), otherwise metadata between files for different directories will actually collide. Are there alternatives/better solutions for tracking datasets metadata (groups of files in a folder)?

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="comment 1"
date="2024-06-06T09:09:03Z"
content="""
You are absolutely right. You might be interested in [DataLad](https://datalad.org), which provides a lot of convenience around git-annex, has the concept of datasets (git submodules) and also an extended approach to metadata.
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd"
nickname="ruslan"
avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9"
subject="comment 2"
date="2024-06-06T11:23:34Z"
content="""
Thank you for the heads up!
I've actually looked in to DataLad, and have been using git annex with submodules.
Problem I found with submodules is that they required a lot of additional steps as far as adding/moving/deleting/syncing them. A very manual process, with a lot of complexity and some rough edge cases. They also interfere with some of Git-Annex functionality like metadata driven views I believe. So I'm using submodules very sparingly, only when I really need them.
As far as DataLad - it looks like a mature and well supported project, would love to see more feedback/reviews on it.
"""]]

View file

@ -34,7 +34,13 @@ For June's work on [[design/passthrough_proxy]], implementation plan:
1. Add `git-annex updateproxy` command and remote.name.annex-proxy
configuration. (done)
2. Test implementation of remote instantiation for proxies.
2. Remote instantiation for proxies almost works, but fails at:
"git-annex: cannot determine uuid for origin-foo"
getRepoUUID does not look at the Repo's UUID setting, but reads it
from git-config. It's not set there for a proxied remote.
So: Add annex-uuid parsing to RemoteConfig.
3. Implement proxying in git-annex-shell.

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd"
nickname="ruslan"
avatar="http://cdn.libravatar.org/avatar/37d3c852372d96daa8a99629755ed1f9"
subject="comment 1"
date="2024-06-05T16:53:50Z"
content="""
Yes, limiting it to a single file would be sufficient for the use case I encountered, and keep it simple from the usage / user interface stand point IMHO
Would look forward to this!
"""]]