Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2025-05-29 13:02:28 -04:00
commit 622979432b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 77 additions and 0 deletions

View file

@ -0,0 +1,57 @@
[[!comment format=mdwn
username="Spencer"
avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55"
subject="I need help with this too (c.f. submodule refactor)"
date="2025-05-29T03:42:42Z"
content="""
I do this quite often because I use a monorepo approach with regular refactoring of subtrees into their own submodules. I have yet to find a bulletproof way to do this on the git-annex side.
The first step is as simple as `git annex unannex` in `A`, or including `--include \"*\"` if pattern matching is easier.
- On the `git` side, this logs the files as deleted from the main repo (`src`, let's call her). This is ideal so that you have a record for yourself (with a descriptive commit message) of where you've moved your files to.
- On the `git-annex` side, (once you commit), the file data will eventually become \"unused\" - you'll have to do some combination of `git annex push` and `git annex sync [--cleanup]` to ensure all branches really don't reference those files (including remote branches and `synced/*` branches).
Now the question is: how do we get the data into the new repo (`dst`) and safely drop from `src`?
- You could add `dst` as a remote of `src` and pull only `dst`'s `git-annex` branch, which (after moving, re-annexing, and committing the unannexed files to `dst`) now shows as having a copy of those files. (**Warning:** this has bad side-effects).
- You could do the opposite but use `dst` to move any (used) files from `src` (**Warning:** this has bad side-effects).
- You could add `dst` as a remote and `move` unused files over (requires a clean unused stack already and having to do the push/sync stuff correctly and fully before the files can be released)
- You could do the opposite and \"copy\" the files *to* `src` first *then* move them over to `dst`. (Required because per `dst`'s knowledge, it has no record of `src` having any keys. I find it logical albeit sad that `git-annex` can't dynamically poll local repos' annexes for file content)
- You could forcibly drop the data either by individual key or once it eventually becomes unused (super unsafe and sad)
### Conclusions
- Keep a clean unused stack (`git annex unused` gives nothing) as much as you can, and clean it out before testing out any sort of move/drop operations like this.
- Option 4 is the best so far. Following the initial step of `gx unannex` in `src`:
- Add `src` as a remote in `dst`, `mv` files into `dst`, `gx add` files in `dst`, `gx copy` files from `dst` back to `src`, then do `gx move -f <src>`
- This will only move the files known by `dst`. If it so happens that one of these files is actually duplicate data with something you want to also be in `src`, this *will* drop it and leave no record in `src` of where it went (besides your `git` commit message).
As described, there are still side effects with Option 4, but it's so far the best option I've devised.
Oh, and if you want to keep `src` around as a remote on `dst` to e.g. remind yourself of various relations, make sure you configure it in `.git/config` with:
- `annex.sync=false`. This skips it when you do a `git annex sync`
- Delete the `remote.fetch` spec, or add `remote.skipFetchAll=true`. This ensures `git fetch` doesn't fetch all the branch and unrelated objects
- (pray there are no more side-effects)
Now, what happens if a side-effect does happen and it looks like you lost some content and don't know where it went? `git annex whereis` is no help.
Instead, you have to extract the key from the now broken symlink and run `find <> -type f -iname \"<KEY>\"`. Easy enough but kind of scary when it happens to you.
### Side-Effects of Option 1+2: `git-annex` synchronization
*DON'T DEAD OPEN INSIDE*
While this is currently the only way to propagate annex key information, it has bad side-effects:
- Remotes and known repos start to clutter whichever absorbs the others' `git-annex` branch. For me this is a no-go because I have redundant remotes (an exporttree called `dropbox` in my case)
- If you decide to `dead` these remotes or repos and by coincidence the `git-annex` branch is later absorbed in the other direction, chaos ensues (`dead` is propagated, remote annex key history is killed: especially gross for export/importtrees)
- Best way to avoid this is to `dead`, `forget --drop-dead` then `semitrust UUID`. Many steps, potentially undefined condition. Gross.
## Potential Feature Requests
Ideally, I would wish `git-annex` could intelligently scan another repo's annex and populate information about what keys it has simply by what keys are objectively in `.git/annex/objects`. This pulls in the information we care about without cluttering additional information relevant only to each respective repo.
Then, presuming you've set up a remote (`dst`) pointing to this repo (`src`) and run `git annex info`, then `src` should have a list of keys that are inside `dst`, and `gx whereis` from `src` will identify the keys inside `dst`, and `drop` will happily do so.
- Maybe there could be something called an `acquaintance` repo that is not allowed to be synced, pulled, fetched, pushed to.
- Acquaintances are semitrusted because they're still annex-controlled.
- On removing an acquaintance repo, and running `gx forget`, the list of keys is wiped.
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="guez@e17c318e09fc77b4a5be4cd330364e3a41a96971"
nickname="guez"
avatar="http://cdn.libravatar.org/avatar/ffec09075c5b5cd47832649a306d68c3"
subject="Not enough information on special remotes"
date="2025-05-28T21:58:23Z"
content="""
You say that the command shows the url used for a WebDAV remote, but this does not seem to be the case any longer:
```
$ git annex info sdrive
uuid: d17d5946-d126-4a0e-b6c1-232fb34fb461
description: sdrive
trust: semitrusted
remote annex keys: 1
remote annex size: 249.11 kilobytes
```
I can get a list of special remotes with `git annex enableremote` but how can I get a more detailed list, with all the information on each special remote: the type, the configuration options (encryption or not, etc.), the URLs?
"""]]