Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2017-05-10 14:37:11 -04:00
commit 1b4425bbc2
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 154 additions and 0 deletions

View file

@ -0,0 +1,88 @@
Hi,
I'm trying to get my head around groups, wanted, etc. for a particular use case.
**Problem:** I can't work out how to get a source(?) repository to automatically drop files when they hit a transfer repository.
I have a machine (`Machine 1`) that is used for data acquisition but it is behind a strict firewall (both physical and virtual). I usually physically carry a USB drive over, set up a rsync ssh -> local-USB-drive from the one machine (`Machine 2`) that is able to connect over the network to `Machine 1`. As it is a pain to lug the drive over, I only do this rsync maybe weekly, so the rsync takes many hours (~24) to complete. Then (when I remember) I visit and I carry the USB drive back... Naturally, this slows down my work process.
What I was hoping to do was set up git-annex with the assistant to help me. I am able to run the assistant, but not the webapp on `Machines 1 and 2`. :-(
My thought was - as these have to be disconnected network transfers...
- `Repository 1 -> Repository 2` (when space permits)
- `Repository 2 -> Repository 3` (when space permits) `-> Repository 4` (USB drive(s))
Another limitation is that `Repos/Machines 2 & 3` have limited storage space.
As a test case I can set up (`Repo1 -> Repo2`) and (`Repo2 -> Repo3`) (on other machines, but the commands should be the same...)
After reading a bit I made a changed [preferred content](/preferred_content/standard_groups/) for a transfer repo to:
```
not (inallgroup=client and copies=client:1) and ($client)
```
i.e. `copies` from `2` to `1`.
---
Finally...The question
----------------------
**BUT** I can't work out how to get `Repo1` (the source) to automatically drop the files when they hit `Repo2` (what I'm guessing should be a transfer repository).
Can anyone suggest how to automagically do this with the assistant?
---
If it would help I can share the git-annex commands I've been using, but as I'm only doing testing up at the moment, I'm happy to start from scratch if there is a RTFM page out there. :-)
I've put some details about my thoughts on the repositories and restrictions below.
Thanks - Olaf
Repository 1
------------
- Type: source (Data collection)
- Human readable directory structure
- Physically: Machine 1
- Strict firewall only incoming network connections from Machine 2
- Storage: 50Gb
Repository 2
------------
- Type: transfer
- Physically: Machine 2
- Reasonably relaxed firewall, can talk to Repository 3
- Limited storage: 10Gb
Repository 3
------------
- Type: transfer
- Pysically: Machine 3
- Reasonably relaxed firewall, can talk to Repository 2
- Limited storage: 10Gb
- Connected to USB drive(s)
Repository 4, 5, ...
--------------------
- Type: ? Client ?
- Human readable directory structure
- Physically: USB drive
- Usually (but not always) connected to machine 3
- Large storage (2Tb) + Additional drives

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 4"
date="2017-05-10T09:21:34Z"
content="""
> And I doubt CandyAngel was counting only the sizes of symlinks and not git repos or at least directory inodes to hold all the symlinks.)
In that repository, it is only top level directories (no sub directories) and each directory in it only has symlinks (up to 8000 of them). Directories are **mkdir $(uuidgen -r)**, hence the wildcard for du.
It would be including the directory size to hold all the inodes, but it definitely *isn't counting .git* as this annex spans 3 drives with 6TB of content so far. Well, 6 drives because of \"numcopies 2\" :P
I will calculate this a different way and only count symlinks, when I have access to it again.
"""]]

View file

@ -0,0 +1,25 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 5"
date="2017-05-10T12:44:08Z"
content="""
$ find -name .git -prune -o -type l | wc -l
1034886
Just over a million symlinks.. very convenient :)
$ find -name .git -prune -o -type l -printf '%s\n' | awk '{sum+=$1} END {print sum/1024**3}'
195.9 # 195MB actual size
$ find -name .git -prune -o -type l -print0 | du -ch --files0-from=- | tail -n1
4.0G total # 4GB disk usage
And in comparison to my earlier comment 2 weeks ago:
$ du -shc *-* | tail -n3
33M fd79bbd4-d41e-4ea8-acc8-86437c5eed7c
33M ffbd042e-f6d9-4450-9a57-8ed1086f587c
4.1G total
So directory inode sizes are dwarfed by the 4K disk usage but ~198b actual usage of the symlinks (~96% wasted space?).
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 6"
date="2017-05-10T12:45:59Z"
content="""
Oops,
find -name .git -prune -o -type l -printf '%s\n' | awk '{sum+=$1} END {print sum/1024**3}'
should have been
find -name .git -prune -o -type l -printf '%s\n' | awk '{sum+=$1} END {print sum/1024**2}'
That'll teach me to prematurely copy it :P
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="https://launchpad.net/~barthelemy"
nickname="barthelemy"
avatar="http://cdn.libravatar.org/avatar/e99cb15f6029de3225721b3ebdd0233905eb69698e9b229a8c4cc510a4135438"
subject="comment 3"
date="2017-05-09T23:38:27Z"
content="""
Hi Joel,
thank you for the precision (and for git annex, and for all the rest!)
Cheers
"""]]