Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2024-06-20 11:03:30 -04:00
commit d89ac8c6ee
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 83 additions and 0 deletions

View file

@ -0,0 +1,25 @@
### Please describe the problem.
Very rarely we get a unittest to error out with smth like
```
2024-06-18T03:18:50.5586670Z E datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false annex find --anything --include '*' --json --json-error-messages -c annex.dotfiles=true' failed with exitcode 139 under /private/var/folders/b3/2xm02wpd21qgrpkck5q1c6k40000gn/T/datalad_temp_test_path_diff49gfi408 [info keys: stdout_json] [err: 'error: git-annex died of signal 11']
2024-06-18T03:18:50.5588210Z
2024-06-18T03:18:50.5588730Z /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datalad/runner/runner.py:242: CommandError
```
unfortunately no more information is captured. I just wanted to seek ideas on what could lead to exit with 11 and may be what data to collect.
original report: [datalad/issues/7490](https://github.com/datalad/datalad/issues/7490)
### What steps will reproduce the problem?
not sure yet how feasible would be to reproduce since happens really rarely
### What version of git-annex are you using? On what operating system?
OSX. Last time - from brew git-annex--10.20240531.arm64_sonoma.bottle.tar.gz
[[!meta author=yoh]]
[[!tag projects/repronim]]

View file

@ -0,0 +1,8 @@
Hi, I have some large repositories on a separate disk server that I would like to be able to browse on my desktop pc or laptop.
The repositories do not fit on the my client's disk, therefore I cannot just use `git annex get .`
One solution would be a readonly NFS mount. However, adding new files as I now more complicated: I have to clone the repo (via ssh) to my desktop/laptop, add new files, use `git annex copy` to get them on the server and then update the working copy there.
In addition, the readonly mount does not allow me to modify text files which are not managed by git annex.
I've been thinking about using some kind of union fs (overlayfs / mergerfs) but the dead symlinks of the local copy would probably hide the files of the NFS mount. I could probably also just symlink .git/annex/objects to the NFS mount but that sounds like a pretty unsafe and bad idea.
Any suggestions how I might solve this problem?

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joris"
avatar="http://cdn.libravatar.org/avatar/6fb83f8f62afd4adac91fb14b60928c6"
subject="comment 2"
date="2024-06-20T09:58:05Z"
content="""
Was this ever explored more? This would be very interesting to be able to use the metadata functionality on regular git files that are not in the annex.
"""]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec"
nickname="beryllium"
avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137"
subject="Grafting? a special remote for tuned migration"
date="2024-06-15T00:57:26Z"
content="""
Naively, I put myself in a position where my rather large, untuned git-annex had to be recovered due to not appreciating the effect of case-insensitive filesystems.
Specifically, NTFS-3G is deadly in this case. Because, whilst Windows has advanced, and with WSL added the ability to add case-sensitivity on a folder, which is also inheritable to folders under it... NTFS-3G does not do this.
So beware if you try to work in an \"interoperable\" way. NTFS-3G will do mixed case, but will create child folders that are not case-sensitive.
To that end, I want to migrate this rather large git-annex to be tuned to annex.tune.objecthashlower. I already have a good strategy around this. I'll just create a completely new stream of git-annex'es originating from a newly formed one. I will also be able to create new type=directory special remotes for my \"tape-out\" existing git-annex. I will just use git annex fsck --fast --from $remote to rebuild the location data for it.
I've also tested this with an S3 git-annex as a proof-of-concept. So in the new git-annex, I ran git-annex initremote cloud type=S3... to create a new bucket, copied over a file from the old bucket, and rebuilt the location data for that file.
But I really really would like to be able to avoid creating a new bucket. I am happy to lose the file presence/location data for the old bucket, but I'd like to graft back in, or initremote the cloud bucket with matching parameters. So too I guess, with an encrypted special remote, ie. import over the encryption keys, etc.
Are there \"plumbing\" commands that can do this? Or does it require knowing about the low-level storage of this metadata to achieve it, which seems to just send me back to the earlier comment of using a filter-branch... which I am hoping to avoid (because of all the potential pit-falls)
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec"
nickname="beryllium"
avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137"
subject="comment 7"
date="2024-06-15T07:37:06Z"
content="""
I have found one way to graft in the S3 bucket. And that involves performing git-annex initremote cloud type=S3 <params>, which unavoidably creates a new dummybucket (can use bucket=dummy to identify it). Then performing git-annex enableremote cloud bucket=cloud-<origuuid> to utilise the original bucket without having to copy/move over all the files.
I did try it in one shot with git-annex initremote cloud type=S3 bucket=cloud-<origuuid> <params>, but unfortunately it fails because the creation of the bucket step appears mandatory, and the S3 api errors out with an \"already created bucket\" type of error.
However, if there is a general guidance somewhere for... I guess importing/exporting the special remote metadata (including stored encryption keys), that would be very much appreciated.
Sorry, I should just clarify. Trying to do this via sync from the old, non-tuned git-annex repo fails with:
git-annex: Remote repository is tuned in incompatible way; cannot be merged with local repository.
Which I understand for the wider branch data implications... but I don't know enough to understand why just the special remote data can't be merge in.
"""]]