Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2022-01-11 12:25:12 -04:00
commit c031d19c32
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 233 additions and 18 deletions

View file

@ -0,0 +1,57 @@
I am working with a bare repository to transfer two keys from a custom backend to and from a special remote. This seems to be working fine.
In order to be able to make use of export remotes (exporttree=yes), I need to be able to specific a tree to be exported. For technical reasons, I want to keep using a bare repository, and use a `hash-object`, `update-index`, and `write-tree` manually in order to create a tree. The Python code snippet that does this looks like this:
```
for key, prefix, fname in (
# the prefixes are constant hashdir-lower
(RepoAnnexGitRemote.refs_key, 'a11/1c8', '.datalad/dotgit/refs'),
(RepoAnnexGitRemote.repo_export_key, '6b2/c13',
'.datalad/dotgit/repo-export.zip')):
# create a blob for the annex link
out = repo._git_runner.run(
['git', 'hash-object', '-w', '--stdin'],
stdin=bytes(
f'../../.git/annex/objects/{prefix}/{key}/{key}', 'utf-8'),
protocol=StdOutCapture)
linkhash = out['stdout'].strip()
# place link into a tree
out = repo._git_runner.run(
['git', 'update-index', '--add', '--cacheinfo', '120000',
linkhash, fname],
protocol=StdOutCapture)
# write the complete tree, and return ID
out = repo._git_runner.run(
['git', 'write-tree'],
protocol=StdOutCapture)
exporttree = out['stdout'].strip()
```
It essentially creates the two blobs for the annex links, puts them together in a tree, and writes it to the repo.
However, after this code ran, git-annex is not longer operating properly in the bare repo:
```
% git annex drop --all
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
fatal: relative path syntax can't be used outside working tree
git-annex: fd:21: Data.ByteString.hGetLine: end of file
```
(fatal error messages are from cat-file batch calls inside)
When I comment this code out, everything goes back to normal. It seems to makes no difference whether I follow the problematic code up with a `commit-tree` and `update-ref` to actually have the mainline branch point to a commit with that tree. It also seems to make no difference, when I explicitly `setpresentkey <key> <here> 0`.
AFAICS this creates the same records as if I would have done this in a regular worktree using high-level git-annex tooling. Other git-annex commands like `fsck` seem to be working fine. If a create a branch with that tree, also `findref` seems to be working properly.
Is this a bug, or am I doing something wrong? Thanks in advance for your time!

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="amerlyq"
avatar="http://cdn.libravatar.org/avatar/3d63c9f436b45570d45bd003e468cbd3"
subject="comment 6"
date="2022-01-08T14:32:11Z"
content="""
Now, if Android is varying the mtime it reports for files [...]
> I tried, using a directory special remote, touching a file in the remote after having already imported it once.
Hm, I think I will enable debug logging for awhile, and will try to catch more info for my heisenbug.
It may take weeks though, so simply know that no activity in this issue does not mean I had abandoned it.
I will explicitly state so, if it will ever be a case.
> On the merge commits, importing creates one, and exporting creates one. So sync creates two.
> Also, if you export and then merge the remote tracking branch (a fast-forward merge), and then export again, it makes another merge commit.
Yes, and I hoped for a fast and dirty fix -- check diff before merge -- and if it's empty -- don't do that useless merge commit.
It will unblock my primary workflow to start using ADB in full, as I stop fearing to trash my history on all my remotes (as I mentioned \"rebase\" won't help due to how \"git annex sync\" works).
But maybe on empty commits still better to print something into debug logs or in warnings -- so the original bug still could be tracked and I continued searching for root cause.
> See 1503b86a14865ce300ebb9c4d96315eeb254d0b8 (and subsequent 2bd0e07ed83db39907f0c824854d68c1a8ba77ac and a32f31235a67d572d989ad9e344efe11d78774a5 where this was introduced. This stuff makes my head hurt, and getting it wrong leads to broken merges from the remote tracking branch...
I skimmed through those diffs, and I may say my head huts too :)
And I will need to look more into surrounding code to understand them in full.
Still I will return to them again after some debug logs were collected.
Until then -- is it possible to do what I mentioned above -- \"check diff before merge -- and don't merge if it's empty\" ?
"""]]

View file

@ -0,0 +1,49 @@
[[!comment format=mdwn
username="amerlyq"
avatar="http://cdn.libravatar.org/avatar/3d63c9f436b45570d45bd003e468cbd3"
subject="comment 6"
date="2022-01-08T14:15:55Z"
content="""
> open to being argued out of my current position
Ok, let's continue :)
> same problem:
> * checking the files into a git repository not using git-annex, and pulling from that repository.
> * running git-annex add and using git-annex get to transfer over a ssh connection.
> * [not supporting] workflow with adb or some other type of remote [is not a bug]
I distinguish ADB from all other types of remotes -- because it's the actual *source* of new files -- not yet processed by user manually.
And what you mentioned above -- are scenarios occuring on full-fledged work system, not on half-baked android phone.
When you sync Laptop with PC -- you must add files either on Laptop or on PC into git-annex first.
Therefore you have an opportunity to do something with files first e.g. sort them into the folders by date,
before adding them into git and losing that mtime information (which at that point is still useful, but not necessary).
When you sync PC with any \"backup\" remote -- they are pushed/pulled *after* files were added to git-annex.
I.e. none of them adds new files, which user never seen before -- and process only \"existing\" ones.
But when you use ADB (or maybe Directory too -- however I still don't have a usecase for that) -- new files are added to ADB
directly, avoiding user intervention. Because it's a pain to sort them on the phone immeditely without proper tools and scripts.
And one of the purposes of using git-annex here -- is to fetch them to PC to sort properly on big screen.
But fetching them without \"pull -a\" looses the necessary information.
It's not that big of a problem for DCIM folder, as files contain dates in filenames, but it's an issue for Downloads (and separate folders of each chat app).
Therefore yes, ADB is different, it's involved into different workflow, and therefore deserves different treatment.
---
> when you git-annex add a file, the mtime of the file (now a symlink) should also be unchanged
Ok, that's a different original reason. Agreed.
Still, it has a nice consequence of preserving mtime for files already present on PC.
And it allows me to scan whole filesystem and dump metadata into a separate file (e.g. \"find -printf \"%T@ %P\n\"),
to preserve the information \"when I first seen/downloaded that file\" for the future.
And it's very important information (at least for me), because it's easier to remember and link related things occuring in a similar timespan,
than to sort files by types and then fruitlessly trying to link those fragmented and sparse datasets inside my head after several months or years.
"""]]