Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2024-11-25 12:16:32 -04:00
commit 2917caaba3
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 84 additions and 0 deletions

View file

@ -0,0 +1,50 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="re: How to get a list of all NOT unused files"
date="2024-11-25T02:23:25Z"
content="""
> How to get a list of all NOT unused files
There may be a simpler way, but one idea:
* list all unused keys
* list all present keys
* filter out the unused keys from the present keys
So something like this:
```
$ git annex findkeys | sort >present-keys
$ git annex unused --json | jq -r '.\"unused-list\" | to_entries | map(.value) | .[]' | sort >unused-keys
$ comm -2 -3 present-keys unused-keys
```
> Those that should be saved are tagged
If you wanted to focus just on keys referenced from tags, you could
generate a list of those keys with
```
$ git rev-list --objects --no-object-names --no-walk --tags | \
git annex lookupkey --ref --batch | grep -v '^$'
```
---
After generating a list of keys with either of those approaches, you
could copy them to your new repo with
```
git annex copy --to=NEW-REMOTE --batch-keys ...
```
For example, the full pipeline for the second approach could be
```
$ git rev-list --objects --no-object-names --no-walk --tags | \
git annex lookupkey --ref --batch | grep -v '^$' | \
git annex copy --to=NEW-REMOTE --batch-keys
```
"""]]

View file

@ -0,0 +1,18 @@
Has anyone done something where devices may not have a direct connection to other git-annex devices, but where they can push out a request for a file? Basically, something that allows them to post file requests that other devices then pickup and relay to a shared endpoint that can be populated with the requested file? I'm currently thinking of a situation where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device (without the file or a direct connection to devices containing the file) may connect to the cloud node and request the file.
## Possible behavior
The requestor could populate a 'git-annex-requests' file at the root of the repository with contents similar to the following:
```text
file-shasum, requester-id, (optional endpoint1), (optional endpoint2),...
```
This 'git-annex-requests' file would require a minimum of the file-shasum and requester id, with the endpoints helping other devices (containing the desired file) to know where to best push the file (other than guessing/all available remotes). So, for the attached diagram, where the mobile laptop attached to the cellphone wants a file from the remote-office nas/server, a flow would look like: laptop updates request file -> syncs file to phone -> syncs file to homelab server -> allows home office computer to sync file -> syncs file to home-office nas/server; the home-office computer gets the file from the nas and pushes it to the homelab server -> the mobile phone downloads the file from the homelab server -> the mobile laptop gets the file from the phone and removes the request from the requests file (which then triggers the reverse propagation of the acknowledgement/removal of the request and allows the devices to proceed with any garbage cleanup).
Additionally, a 'git-annex-routing' file could optionally be added that includes netlist details describing routing chains where certain 'static' devices may be able to easily push to each other so that other git-annex clients can make more informed decisions on where to push a file.
[[!img git_annex_request.png align="right" size="" alt="Network diagram"]]
[[!img git_annex_request_routing.png align="right" size="" alt="Network diagram showing file routing"]]
## Alternate links for images
Example diagram with mixed network connections: [[https://imgur.com/gallery/network-diagram-with-mixed-inbound-outbound-connections-3q76OzI]]
Example diagram with network request across mixed network connections: [[https://imgur.com/gallery/network-diagram-file-request-across-mixed-inbound-outbound-connections-hev94Kj]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="aaron"
avatar="http://cdn.libravatar.org/avatar/8a07e2f7af4bbf1bfcb48bbc53e00747"
subject="Overriding git folder"
date="2024-11-25T02:55:25Z"
content="""
It seems to be that git has gotten smarter and now actively prevents you from adding a `.git` folder (I did this many years ago when before I learned about submodules); I'd like to do something like the following:
```bash
git init --separate-git-dir=.gitannex .
git --git-dir=.gitannex annex init
git clone some_repo # A repo I'm pulling from GitHub/wherever and don't want a submodule of as it's not my personal project
git --git-dir=.gitannex add some_repo
```
Essentially, I can override that `.git` folder name, but it still checks for other `.git` folders; is there a way to remove this check?
"""]]