Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-12-17 12:45:07 -04:00
commit c52550a6a8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 130 additions and 0 deletions

View file

@ -0,0 +1,29 @@
There is a syntax error in commit f29d49d47 that is preventing git-annex from building:
```
Command/Move.hs:214:36: error: parse error on input `->'
|
214 | Right True -> return True
| ^^
```
The following patch fixes it:
```
diff --git a/Command/Move.hs b/Command/Move.hs
index 584565648..d8c4bc4c9 100644
--- a/Command/Move.hs
+++ b/Command/Move.hs
@@ -210,7 +210,7 @@ fromOk :: Remote -> Key -> Annex Bool
fromOk src key
-- check if the remote contains the key, when it can be done cheaply
| Remote.hasKeyCheap src =
- Remote.hasKey src key >>=
+ Remote.hasKey src key >>= \x -> case x of
Right True -> return True
-- Don't skip getting the key just because the
-- remote no longer contains it if the log
```
[[!meta author=jwodder]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,13 @@
Does anybody know if it's possible to rsync files into an annex folder without constantly overwriting already annexed files? I know that the `-L` option will de-reference links on the source side, but I don't see any option for doing the same on the target side.
I have two machines -- alpha and beta. Alpha doesn't know anything about git annex, but knows how to rsync. I have a folder on alpha that I regularly sync over to beta with:
```
rsync -av folder/ beta:/.../annex/folder
```
That way I know that I can eventually safely delete any material in the folder on alpha because it's been backed up/archived to my annex. The problem is, every time I add the files into the annex on beta, they get replaced with symlinks, which wouldn't be a problem except for the fact that now the next time I run the rsync command all of the files get retransmitted because they don't match the symlinks over on beta. This doesn't cause a problem on the git annex side, but it significantly slows down the rsync process.
Is this a case for an rsync remote? (I haven't really figured out special remotes yet.) Or is there a typical workflow on the git annex side that I could be using to fix this (like `import` rather than `add`)?
Thanks!

View file

@ -0,0 +1,23 @@
[[!comment format=mdwn
username="dscheffy@c203b7661ec8c1ebd53e52627c84536c5f0c9026"
nickname="dscheffy"
avatar="http://cdn.libravatar.org/avatar/62a3a0bf0e203e3746eedbe48fd13f6d"
subject="How do I `man` annex config options?"
date="2020-12-17T10:39:40Z"
content="""
Right after posting this last night I came across [this](https://git-annex.branchable.com/forum/How_to_prevent_copies_on_a_single_device_and_use_only_hardlinks/) forum entry, which led me to the tip on how to create a cache annex, which eventually led me to a little more detail on the `annex.hardlink` and `annex.thin` options.
It sounds like they may kind of be what I'm looking for, but I'm not sure how to find more details (doc) on how to use them or specifically what they do. They aren't listed in supported options when I run (on v6.2):
```
git annex help config
```
I google `git annex.hardlink` and `git annex.thin`, but those just point me to bugs or forums or tips that refer to the settings.
Are these annex wide settings? (that seems to be the case). Is it possible to apply them at a folder level? Am I maybe just missing the point of lock/unlock?
I'll keep looking and run some experiments on my own.
Thanks again!
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="comment 2"
date="2020-12-17T16:06:44Z"
content="""
> How do I `man` annex config options?
The options are listed in the `man git-annex` output (which is also
available at <https://git-annex.branchable.com/git-annex/>).
"""]]

View file

@ -0,0 +1,54 @@
[[!comment format=mdwn
username="dscheffy@c203b7661ec8c1ebd53e52627c84536c5f0c9026"
nickname="dscheffy"
avatar="http://cdn.libravatar.org/avatar/62a3a0bf0e203e3746eedbe48fd13f6d"
subject="Duplicate content creates frustrating cycles"
date="2020-12-16T17:10:52Z"
content="""
I'm currently cleaning up 3 machines (with the goal of eventually upgrading my OS's) and 2 large external drives filled with 10 plus years of backups, so my current situation is somewhat temporary and may not apply to others.
I've started using preferred content to manage which repos hang onto which content. My main cleanup workflow involves moving files into a staging repository and then adding them to the annex -- then letting the preferred content settings figure out where to send the content. If I know exactly where I want the content to go, I'll move it directly into the appropriate folder, but if I haven't figured that out yet, sometimes I'll just put it in a `stage` folder. I've simplified my preferred content settings to assume that I only have one `big` external drive where everything except the contents of the `stage` directory should go, but in reality it's split up a bit across the two drives I already mentioned...
```
$ git annex wanted big
include=* and exclude=stage/*
$ git annex wanted stage
include=stage/*
```
I noticed the other day that I had some missing content in `big/photo/raw`, so I went into that folder and ran `git annex get .` to rehydrate the missing files.
Today I staged some new files and ran the following from my staging annex:
```
git annex add stage
git commit -m 'stage some new photos'
git annex sync --content
```
This when I noticed some weirdness:
```
pull big
...
ok
(merging big into stage...)
(recording state in git...)
copy photos/raw/pict0001.jpg (to big...)
SHA256E-abc--xyz.jpg
(checksum...) ok
drop photos/raw/pict0001.jpg ok
...
get stage/cats.jpg (from big...)
SHA256E-abc--xyz.jpg
(checksum...) ok
drop big stage/cats.jpg ok
pull big
```
Basically, if two copies of the same content live in two different files that have an affinity to two or more mutually exclusive annexes, it seems like the rule that applies to the last file in the directory tree is arbitrarily going to be the one that wins out in the end. It also means if you have such a situation, you're going to see a strange dance like this everytime you run `git annex sync --content` as the content moves across annexes only to make it's way back to where it started.
I'm currently running v6.2, so maybe this has been fixed in the interim. Has anybody else seen this? Do standard groups address this problem? I started out tryint to use standard groups, but fell back on my own custom folder definitions when I couldn't figure out how to keep my standard groups from grabbing more content than I wanted them to.
Thanks!
"""]]