Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2022-01-11 12:25:12 -04:00
commit c031d19c32
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 233 additions and 18 deletions

View file

@ -0,0 +1,57 @@
I am working with a bare repository to transfer two keys from a custom backend to and from a special remote. This seems to be working fine.
In order to be able to make use of export remotes (exporttree=yes), I need to be able to specific a tree to be exported. For technical reasons, I want to keep using a bare repository, and use a `hash-object`, `update-index`, and `write-tree` manually in order to create a tree. The Python code snippet that does this looks like this:
```
for key, prefix, fname in (
# the prefixes are constant hashdir-lower
(RepoAnnexGitRemote.refs_key, 'a11/1c8', '.datalad/dotgit/refs'),
(RepoAnnexGitRemote.repo_export_key, '6b2/c13',
'.datalad/dotgit/repo-export.zip')):
# create a blob for the annex link
out = repo._git_runner.run(
['git', 'hash-object', '-w', '--stdin'],
stdin=bytes(
f'../../.git/annex/objects/{prefix}/{key}/{key}', 'utf-8'),
protocol=StdOutCapture)
linkhash = out['stdout'].strip()
# place link into a tree
out = repo._git_runner.run(
['git', 'update-index', '--add', '--cacheinfo', '120000',
linkhash, fname],
protocol=StdOutCapture)
# write the complete tree, and return ID
out = repo._git_runner.run(
['git', 'write-tree'],
protocol=StdOutCapture)
exporttree = out['stdout'].strip()
```
It essentially creates the two blobs for the annex links, puts them together in a tree, and writes it to the repo.
However, after this code ran, git-annex is not longer operating properly in the bare repo:
```
% git annex drop --all
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
git-annex: fd:21: Data.ByteString.hGetLine: end of file
```
(fatal error messages are from cat-file batch calls inside)
When I comment this code out, everything goes back to normal. It seems to makes no difference whether I follow the problematic code up with a `commit-tree` and `update-ref` to actually have the mainline branch point to a commit with that tree. It also seems to make no difference, when I explicitly `setpresentkey <key> <here> 0`.
AFAICS this creates the same records as if I would have done this in a regular worktree using high-level git-annex tooling. Other git-annex commands like `fsck` seem to be working fine. If a create a branch with that tree, also `findref` seems to be working properly.
Is this a bug, or am I doing something wrong? Thanks in advance for your time!

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="amerlyq"
avatar="http://cdn.libravatar.org/avatar/3d63c9f436b45570d45bd003e468cbd3"
subject="comment 6"
date="2022-01-08T14:32:11Z"
content="""
Now, if Android is varying the mtime it reports for files [...]
> I tried, using a directory special remote, touching a file in the remote after having already imported it once.
Hm, I think I will enable debug logging for awhile, and will try to catch more info for my heisenbug.
It may take weeks though, so simply know that no activity in this issue does not mean I had abandoned it.
I will explicitly state so, if it will ever be a case.
> On the merge commits, importing creates one, and exporting creates one. So sync creates two.
> Also, if you export and then merge the remote tracking branch (a fast-forward merge), and then export again, it makes another merge commit.
Yes, and I hoped for a fast and dirty fix -- check diff before merge -- and if it's empty -- don't do that useless merge commit.
It will unblock my primary workflow to start using ADB in full, as I stop fearing to trash my history on all my remotes (as I mentioned \"rebase\" won't help due to how \"git annex sync\" works).
But maybe on empty commits still better to print something into debug logs or in warnings -- so the original bug still could be tracked and I continued searching for root cause.
> See 1503b86a14865ce300ebb9c4d96315eeb254d0b8 (and subsequent 2bd0e07ed83db39907f0c824854d68c1a8ba77ac and a32f31235a67d572d989ad9e344efe11d78774a5 where this was introduced. This stuff makes my head hurt, and getting it wrong leads to broken merges from the remote tracking branch...
I skimmed through those diffs, and I may say my head huts too :)
And I will need to look more into surrounding code to understand them in full.
Still I will return to them again after some debug logs were collected.
Until then -- is it possible to do what I mentioned above -- \"check diff before merge -- and don't merge if it's empty\" ?
"""]]

View file

@ -0,0 +1,49 @@
[[!comment format=mdwn
username="amerlyq"
avatar="http://cdn.libravatar.org/avatar/3d63c9f436b45570d45bd003e468cbd3"
subject="comment 6"
date="2022-01-08T14:15:55Z"
content="""
> open to being argued out of my current position
Ok, let's continue :)
> same problem:
> * checking the files into a git repository not using git-annex, and pulling from that repository.
> * running git-annex add and using git-annex get to transfer over a ssh connection.
> * [not supporting] workflow with adb or some other type of remote [is not a bug]
I distinguish ADB from all other types of remotes -- because it's the actual *source* of new files -- not yet processed by user manually.
And what you mentioned above -- are scenarios occuring on full-fledged work system, not on half-baked android phone.
When you sync Laptop with PC -- you must add files either on Laptop or on PC into git-annex first.
Therefore you have an opportunity to do something with files first e.g. sort them into the folders by date,
before adding them into git and losing that mtime information (which at that point is still useful, but not necessary).
When you sync PC with any \"backup\" remote -- they are pushed/pulled *after* files were added to git-annex.
I.e. none of them adds new files, which user never seen before -- and process only \"existing\" ones.
But when you use ADB (or maybe Directory too -- however I still don't have a usecase for that) -- new files are added to ADB
directly, avoiding user intervention. Because it's a pain to sort them on the phone immeditely without proper tools and scripts.
And one of the purposes of using git-annex here -- is to fetch them to PC to sort properly on big screen.
But fetching them without \"pull -a\" looses the necessary information.
It's not that big of a problem for DCIM folder, as files contain dates in filenames, but it's an issue for Downloads (and separate folders of each chat app).
Therefore yes, ADB is different, it's involved into different workflow, and therefore deserves different treatment.
---
> when you git-annex add a file, the mtime of the file (now a symlink) should also be unchanged
Ok, that's a different original reason. Agreed.
Still, it has a nice consequence of preserving mtime for files already present on PC.
And it allows me to scan whole filesystem and dump metadata into a separate file (e.g. \"find -printf \"%T@ %P\n\"),
to preserve the information \"when I first seen/downloaded that file\" for the future.
And it's very important information (at least for me), because it's easier to remember and link related things occuring in a similar timespan,
than to sort files by types and then fruitlessly trying to link those fragmented and sparse datasets inside my head after several months or years.
"""]]

View file

@ -1,20 +1,16 @@
[[!comment format=mdwn
username="lell"
avatar="http://cdn.libravatar.org/avatar/4c4138a71d069e290240a3a12367fabe"
subject="Propagation is different between &quot;annex adjust&quot; and &quot;annex sync&quot;"
date="2022-01-07T10:21:46Z"
content="""
*This post is moved from the git-annex-adjust manpage, thanks @joey for the hint on where to put it*
Adjusted branches are important to my data science project, because my programs cannot deal with the read-only symlinks to annex'ed files.
But I find this command confusing, especially that
1. Calling on an unlocked adjusted branch, \"git annex adjust --unlock\" propagates commits back to the master branch differently than \"git annex sync --no-push --no-pull --no-content\" does.
1. Calling on an unlocked adjusted branch, `git annex adjust --unlock` propagates commits back to the master branch differently than `git annex sync --no-push --no-pull --no-content` does.
2. I can't find a way to \"un-adjust\" a branch without resorting to lower-level git commands.
2. I can't find a way to "un-adjust" a branch without resorting to lower-level git commands.
## Problem 1:
Say I have done `git annex adjust --unlock` and then have done more commits. The history now looks like this:
Say I have done `git annex adjust --unlock` and then have done more commits. The history now looks like this:
* My new commit 2 (HEAD -> adjusted/master(unlocked))
* My new commit 1 (HEAD -> adjusted/master(unlocked))
@ -22,8 +18,7 @@ Say I have done `git annex adjust --unlock` and then have done more commits. The
* Last old commit (master, basis/adjusted/master(unlocked))
* Previous commits
If I execute now `git annex adjust --unlock` again, the commits are propagated back to the original branch,
but my HEAD is still on the original adjusted branch. So both the master branch and my adjusted branch grow over time which clutters the history and is confusing.
If I execute now `git annex adjust --unlock` again, the commits are propagated back to the original branch, but my HEAD is still on the original adjusted branch. So both the master branch and my adjusted branch grow over time which clutters the history and is confusing.
* My new commit 2 (master)
| * My new commit 2 (HEAD -> adjusted/master(unlocked))
@ -46,17 +41,15 @@ On the other hand, if I do `git annex sync --no-push --no-pull --no-content`, th
* Last old commit
* Previous commits
This behaviour makes much more sense to me! Why does it take the modified `sync` command to do this? Why is this not done as well when re-calling `annex adjust --unlock`? The sync command seems a counter-intuitive place to do this, using the `adjust` command would be far more intuitive for me and I think also for other users.
This behaviour makes much more sense to me! Why does it take the modified sync command to do this? Why is this not done as well when re-calling `git annex adjust --unlock`? The sync command seems a counter-intuitive place to do this, using the adjust command would be far more intuitive for me and I think also for other users.
## Problem 2
I see no easy way of \"un-adjusting\" an adjusted branch. Currently I do
I see no easy way of "un-adjusting" an adjusted branch. Currently I do
git sync --no-push --no-pull --no-content
git checkout master
git branch -D \"adjusted/master(unlock)\"
git branch -D \"refs/basis/adjusted/master(unlock)\"
git branch -D "adjusted/master(unlock)"
git branch -D "refs/basis/adjusted/master(unlock)"
That's a lot of text for the inverse operation of `git annex adjust --unlock` and also I have to take care myself to not forget and loose commits I did on the adjusted branch. Did I miss an easier way? If not, I think it would be a great addition.
"""]]
That's a lot of text for the inverse operation of `git annex adjust --unlock` and also I have to take care myself to not forget and loose commits I did on the adjusted branch. Did I miss an easier way? If not, I think it would be a great addition.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="lell"
avatar="http://cdn.libravatar.org/avatar/4c4138a71d069e290240a3a12367fabe"
subject="comment 7"
date="2022-01-10T08:33:46Z"
content="""
Thanks for the help on where to put the question, joey. I have deleted and [moved](https://git-annex.branchable.com/forum/Propagate_changes_and_remove_adjusted_branch/?updated) my previous post.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="aaron"
avatar="http://cdn.libravatar.org/avatar/8a07e2f7af4bbf1bfcb48bbc53e00747"
subject="How does the gcrypt type compare to the rsync special remote"
date="2022-01-09T05:31:34Z"
content="""
For systems (such as rsync.net) that allow us to use both rsync and git-annex, which is the better option? I noticed that the default for rsync.net was this type (gcrypt), does it provide additional capabilities or perform slightly better than the [rsync version](https://git-annex.branchable.com/special_remotes/rsync/)?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="tomdhunt"
avatar="http://cdn.libravatar.org/avatar/02694633d0fb05bb89f025cf779218a3"
subject="comment 2"
date="2022-01-09T05:45:35Z"
content="""
The gcrypt special remote is for use with the git-remote-gcrypt protocol for making encrypted git remotes. The rsync special remote can also encrypt the files it stores, but it's not related to git-remote-gcrypt; it just puts files in a plain directory tree using rsync.
If you want to keep a repository remote (not a special remote) on your rsync.net host, and want it to be encrypted, then you can use git-remote-gcrypt and use this special remote so that the data is all together. If you're not using git-remote-gcrypt, then the rsync special remote is what you want.
"""]]

View file

@ -0,0 +1,53 @@
[[!comment format=mdwn
username="aaron"
avatar="http://cdn.libravatar.org/avatar/8a07e2f7af4bbf1bfcb48bbc53e00747"
subject="comment 3"
date="2022-01-09T08:36:34Z"
content="""
@tomdhunt, Are you saying that the difference is the rsync remote only contains the files and the actual history stuff from git isn't tracked in it while the git-remote-gcrypt one also tracks history because it is a bare git repo?
Additionally, I just started trying out the grcrypt version on rsync.net and it seems to use a slightly different initialization when compared to the others. I've made some progress, but I am still not quite able to make it work, it seems that I'm having issues initializing the bare remote when I do it via the terminal. If I don't try to create a bare and push it the first commit completely fails, I seem to be able to make more progress by creating a bare, pushing it, and then adding it (but it still fails). This is what I have got to so far:
```bash
user@localhost:$ sudo chown <rsync.net user>:<rsync.net user> -R
user@localhost:$ git init --bare shared=group test_repo.git
user@localhost:$ sudo rsync -vrSP test_repo.git <rsync.net user>@<server>:annex
user@localhost:$ git annex initremote \"<some_userful_name>\" type=gcrypt gitrepo=<rsync.net user>@<server>:annex chunk=1MiB keyid=<key_id> encryption=shared mac=HMACSHA512 autoenable=true
```
The error message that I get:
```bash
user@localhost:$ git annex sync
commit
On branch master
Initial commit
nothing to commit (create/copy files and use \"git add\" to track)
ok
pull <rsync.net server>
gcrypt: Decrypting manifest
gpg: selecting card failed: No such device
gpg: Signature made Sun 09 Jan 2022 08:26:18 AM GMT
gpg: using EDDSA key <key>
gpg: Good signature from \"<key comment>\" [ultimate]
merge: refs/remotes/<remote name>/master - not something we can merge
merge: refs/remotes/<remote name>/synced/master - not something we can merge
failed
sync: 1 failed
```
It also looks like this method fails to add `gcrypt-participants = <key>` and `gcrypt-signingkey = <key>` to the `.git/config` file like webapp does.
Furthermore, when I use the `git annex webapp` to generate the repo, it does something that seems to be even more different (and successfully creates the bare repo by itself), specifically the URL looks something like this:
```bash
url = gcrypt::<rsync.net user>@git-annex-.<country>.2D<server_subname?>.2E<server_name>.2E<server_domain>-<rsync.net user>_22_annex:annex/
```
It seems to be encoding some of the characters to make a URL? Is there another web API that we can interact with?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="aaron"
avatar="http://cdn.libravatar.org/avatar/8a07e2f7af4bbf1bfcb48bbc53e00747"
subject="Not auto-signing commits with webapp (and possible assistant)"
date="2022-01-09T04:29:26Z"
content="""
Are we still concerned about this (and does this still apply if we are using encrypted remotes?)? I do have my normal git stuff configured to use GPG signing because I like the concept, but noticed that when running `git annex webapp` the automated commits seemed to bypass my normal global git config and not add a signature.
"""]]