Merge branch 'master' into ospath

This commit is contained in:
Joey Hess 2025-02-11 16:56:17 -04:00
commit bab26da74b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 120 additions and 3 deletions

View file

@ -19,6 +19,7 @@ import Prelude
import Utility.PartialPrelude
import Utility.Directory
import Utility.SystemDirectory
import Utility.Process
import Utility.Monad
import Utility.SafeCommand

View file

@ -0,0 +1,54 @@
### Please describe the problem.
I have a repository that contains thousands (30000) of unlocked files. I see the problem that
git keeps refreshing its index despite the modified time did not change compared to the git index.
I think this should not happen. It significantly slows down (delay of ~10s in my case, but
potentially worse for Windows or NFS file systems) any git operation, and this includes
the display of a bash prompt that shows git information (via PROMPT_COMMAND env var.) and bash
completion after things like `git add`, therefore it is quite annoying.
### What steps will reproduce the problem?
I have set up a repository to reproduce the problem. However, it works as expected. I am looking for
ideas on how to find the problem with my repository.
```
git init .
git annex init
git annex config --set annex.addunlocked include="py_env/*"
python -m venv py_env
. py_env/bin/activate
pip install datalad # some package with dependencies that creates many files in py_env
git annex add py_env
git status
git status
git status
...
```
The only strange thing that I have found is that git annex will forward all `__init__.py` files to the smudge/clean-filter (pipes to filter-process), but only those! It does not do this for all other files in `py_env`, as you can check with
```sh
strace git status 2>&1 | grep "^write.*pathname="
```
this shows only the `__init__.py` files, despite `find py_env` shows many more files in there and they are skipped. I think that git skips files during `git status` if the mtime and the ctime are the same. But they are always the same, as can be checked by comparing
```sh
git ls-files --debug "(any file that shows up in strace)"
stat -c $'ctime: %.9Z\nmtime: %.9Y' "(same file)"
```
I do not know how to debug further and are grateful for suggestions!
### What version of git-annex are you using? On what operating system?
git-annex version: 10.20250115-ga55e1da1aa0eb07b418e899cc200027e082eb82a
git version 2.47.1
Rocky Linux
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

View file

@ -0,0 +1,45 @@
### Please describe the problem.
Selectively auditing the files annexed on an S3 container, where the file is stored by keyname, I attempted to confirm that the file is also annexed locally.
When I run git-annex whereused --key <keyname>, nothing comes up. But if I look for the file locally, I can see that it exists.
If I add --historical, I can see the file exists in some previous commit that I thought had been successfully merged, and moved on from. It seems like this might be related to export trees, because that keyword is also present in the output with --historical
[[!format sh """
$ git-annex whereused --key SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
$ git-annex whereused --historical --key SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG remotes/origin/git-annex~12:export.tree/2010-08-21/042.JPG
$ git-annex contentlocation SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
.git/annex/objects/0Z/Z3/SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG/SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
$ ls -l $(git-annex contentlocation SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG)
-r--r--r-- 1 shaddy shaddy 1000013 May 21 2024 .git/annex/objects/0Z/Z3/SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG/SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
$ git-annex findkeys | grep -F SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
$ ls -l 2010-08-21/042.JPG
lrwxrwxrwx 1 shaddy shaddy 201 May 20 2024 2010-08-21/042.JPG -> ../.git/annex/objects/0Z/Z3/SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG/SHA256E-s1000013--e435522a9059bcb086b6db5fa5f05a06913266772a7931eefae2b8f7647f5f14.JPG
$ ls -lL 2010-08-21/042.JPG
-r--r--r-- 1 shaddy shaddy 1000013 May 21 2024 2010-08-21/042.JPG
$
"""]]
My reading of the documentation is that the file being present in the local annex shouldn't require the --historical argument, which is much slower.
### What steps will reproduce the problem?
As per above
### What version of git-annex are you using? On what operating system?
git-annex/10.20241031-1~ndall+1 on Ubuntu 22.04 LTS:
Linux computer-ubul 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
### Please provide any additional information below.
Nil
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Love git-annex. Long time supporter.

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="thk"
avatar="http://cdn.libravatar.org/avatar/bfef10a428769701aeee1db978951461"
subject="iroh"
date="2025-02-08T06:56:31Z"
content="""
Please take a look at [iroh](https://www.iroh.computer). It started as an IPFS implementation in rust, realized that IPFS is slow and overengineered and now pivoted to providing p2p connections with quic.
* Peers/nodes/endpoints use ed25519 keys as identities.
* The iroh project hosts relay servers for initial NAT hole punching and as connection fallbacks.
* So far there are 4 initial [discovery](https://www.iroh.computer/docs/concepts/discovery) implementations: DNS, Local (mDNS), [Pkarr](https://github.com/nuhvi/pkarr) or Bittorrents Mainline DHT
I'm waiting for their [FOSDEM talk](https://fosdem.org/2025/schedule/event/fosdem-2025-6053-building-peer-to-peer-quic/). But there is also a good presentation on YT: [A tour of iroh](https://www.youtube.com/watch?v=AkHaIVuFHK4&list=PLvsg-fc7APc3TOrumoQdjggwoLTSfiqRE&index=2).
"""]]

View file

@ -1,8 +1,9 @@
I'm thk at debian org
https://blog.koch.ro
# My TODO items
- implement p2p for annex with iroh: https://git-annex.branchable.com/todo/generic_p2p_socket_transport/#comment-de273c852db02cb46a6bab4987429a3a
- write a tip on using git worktree to inspect the git-annex branch
- Is there a way to filter out the directories?
- write a tip on how to deal with permission issues on ext formatted USB drives

View file

@ -11,10 +11,10 @@ flags:
benchmark: true
crypton: true
servant: true
ospath: true
ospath: false
packages:
- '.'
resolver: nightly-2025-01-20
resolver: lts-23.2
extra-deps:
- filepath-bytestring-1.5.2.0.2
- aws-0.24.4