Merge branch 'master' into balanced

This commit is contained in:
Joey Hess 2024-08-30 11:01:39 -04:00
commit d0938d730b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
28 changed files with 541 additions and 16 deletions

View file

@ -1,6 +1,5 @@
git-annex (10.20240831) UNRELEASED; urgency=medium git-annex (10.20240831) UNRELEASED; urgency=medium
* export: Added --from option.
* Special remotes configured with exporttree=yes annexobjects=yes * Special remotes configured with exporttree=yes annexobjects=yes
can store objects in .git/annex/objects, as well as an exported tree. can store objects in .git/annex/objects, as well as an exported tree.
* Support proxying to special remotes configured with * Support proxying to special remotes configured with
@ -10,22 +9,24 @@ git-annex (10.20240831) UNRELEASED; urgency=medium
special remote (or it is a cluster node) and the configured special remote (or it is a cluster node) and the configured
remote.name.annex-tracking-branch is received, the tree is remote.name.annex-tracking-branch is received, the tree is
exported to the special remote. exported to the special remote.
* updateproxy, updatecluster: Prevent using an exporttree=yes special * Support "balanced=", "fullybalanced=", "sizebalanced=" and
remote that does not have annexobjects=yes, since it will not work. "fullysizebalanced=" in preferred content expressions.
* git-remote-annex: Store objects in exportree=yes special remotes
in the same paths used by annexobjects=yes. This is a backwards
compatible change.
* The config versioning=true is now reserved for use by versioned special
remotes. External special remotes should not use that config for their
own purposes.
* Support "balanced=" and "fullybalanced=" in preferred content expressions.
* Support "sizebalanced=" and "fullysizebalanced=" too.
* Added --rebalance option. * Added --rebalance option.
* maxsize: New command to tell git-annex how large the expected maximum * maxsize: New command to tell git-annex how large the expected maximum
size of a repository is, and to display repository sizes. size of a repository is, and to display repository sizes.
* Added the annex.fullybalancedthreshhold git config. * Added the annex.fullybalancedthreshhold git config.
* vicfg: Include maxsize configuration. * vicfg: Include maxsize configuration.
* info: Improved speed. * info: Improved speed by using new repository size tracking.
* lookupkey: Allow using --ref in a bare repository.
* export: Added --from option.
* git-remote-annex: Store objects in exportree=yes special remotes
in the same paths used by annexobjects=yes. This is a backwards
compatible change.
* updateproxy, updatecluster: Prevent using an exporttree=yes special
remote that does not have annexobjects=yes, since it will not work.
* The config versioning=true is now reserved for use by versioned special
remotes. External special remotes should not use that config for their
own purposes.
-- Joey Hess <id@joeyh.name> Wed, 31 Jul 2024 15:52:03 -0400 -- Joey Hess <id@joeyh.name> Wed, 31 Jul 2024 15:52:03 -0400

View file

@ -15,7 +15,7 @@ import Utility.Terminal
import Utility.SafeOutput import Utility.SafeOutput
cmd :: Command cmd :: Command
cmd = notBareRepo $ noCommit $ noMessages $ cmd = noCommit $ noMessages $
command "lookupkey" SectionPlumbing command "lookupkey" SectionPlumbing
"looks up key used for file" "looks up key used for file"
(paramRepeating paramFile) (paramRepeating paramFile)
@ -35,9 +35,11 @@ optParser = LookupKeyOptions
run :: LookupKeyOptions -> SeekInput -> String -> Annex Bool run :: LookupKeyOptions -> SeekInput -> String -> Annex Bool
run o _ file run o _ file
| refOption o = catKey (Ref (toRawFilePath file)) >>= display | refOption o = catKey (Ref (toRawFilePath file)) >>= display
| otherwise = seekSingleGitFile file >>= \case | otherwise = do
Nothing -> return False checkNotBareRepo
Just file' -> catKeyFile file' >>= display seekSingleGitFile file >>= \case
Nothing -> return False
Just file' -> catKeyFile file' >>= display
display :: Maybe Key -> Annex Bool display :: Maybe Key -> Annex Bool
display (Just k) = do display (Just k) = do

View file

@ -0,0 +1,42 @@
### Please describe the problem.
In a git annex assistant managed repo, Calibre's delete book action causes the file to be committed to git instead of annexed.
### What steps will reproduce the problem?
I have put my entire calibre library into a git annex. I use the assistant in adjusted mode so that all files can be unlocked and calibre is free to do what it wants with the filesystem. I use btrfs to save storage space (but I don't think that part matters).
If I have a book in the Calibre library and then I delete it, Calibre uses some method to move the files to a directory called .caltrash (so that the book can be recovered later). When it does that, git annex does a git add instead of a git annex add. This makes me sad.
I have no .gitignore file.
### What version of git-annex are you using? On what operating system?
```
xentac@baxter:~/calibre$ git annex version
git-annex version: 10.20240430-1~ndall+1
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL VURL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10
```
```
xentac@baxter:~/calibre$ cat /etc/issue
Ubuntu 24.04 LTS \n \l
```
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="xentac"
avatar="http://cdn.libravatar.org/avatar/773b6c7b0dc34f10b66aa46d2730a5b3"
subject="comment 1"
date="2024-08-09T19:31:12Z"
content="""
I stopped the assistant, checked out master, `git reset HEAD^`, and ran git annex add .caltrash and it told me this:
```
add .caltrash/b/592/metadata.opf (non-large file; adding content to git repository) ok
```
In this case it is just a small .opf file, but in a previous case, it added an entire pdf. Either way, I want all the files in git annex.
So I double checked that git annex.largefiles wasn't set:
```
xentac@baxter:~/calibre$ git config annex.largefiles
xentac@baxter:~/calibre$
```
Then I set git annex.largefiles to anything
```
xentac@baxter:~/calibre$ git config annex.largefiles anything
xentac@baxter:~/calibre$
```
But it still wants to add the opf to the git repository!
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de"
subject="comment 2"
date="2024-08-16T15:45:45Z"
content="""
By default, git-annex puts all dotfiles into git and doesn't consider the largefiles setting for them. I think that is what you are seeing. You can change that behavior with the `annex.dotfiles` configuration.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="xentac"
avatar="http://cdn.libravatar.org/avatar/773b6c7b0dc34f10b66aa46d2730a5b3"
subject="comment 3"
date="2024-08-18T03:17:12Z"
content="""
Thanks! That does seem to have been it.
Do you know if there's a way to only treat some dotfiles as annexed? Like I want .caltrash and .calnotes to be annexed, but maybe not if I create .gitattributes.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de"
subject="comment 4"
date="2024-08-19T10:25:13Z"
content="""
When `annex.dotfiles` is set to true, git-annex should treat dotfiles just like other files, so it should apply .gitattributes to them. With that in mind, I'd expect something like this to work:
```
.* annex.largefiles=nothing
.caltrash annex.largefiles=anything
.calnotes annex.largefiles=anything
```
I haven't tested it though.
"""]]

View file

@ -0,0 +1,31 @@
[[!comment format=mdwn
username="Atemu"
avatar="http://cdn.libravatar.org/avatar/6ac9c136a74bb8760c66f422d3d6dc32"
subject="comment 9"
date="2024-08-15T15:40:19Z"
content="""
I just hit this bug again and it's even nastier than I remembered.
I also found a super simple reproducer:
1. Have two machines A and B
2. Init a git-annex repo on A
3. Clone the git-annex repo on B (`git clone ssh://A:/tmp/testrepo`)
4. Make A unreachable for B (i.e. `systemctl suspend`)
5. Execute `git annex info` on B.
6. It hangs forever
I have not found a way to get out of this situation (`--fast` does not help) other than restoring the connection to A which is sometimes simply not possible.
```
git-annex version: 10.20240701
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.6 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10
```
"""]]

View file

@ -0,0 +1,67 @@
### Please describe the problem.
Wanted to use `metadata` (to annotate anatomical T1s with metadata), and then tried `get` on a pathspec.
`git annex` then incorrectly claims that no files patch although I show with `git ls-files` on the same pathspec that there are files:
```shell
git annex version
git-annex version: 10.20240731+git17-g6d1592f857-1~ndall+1
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
...
git ls-files '**/*.nii.gz' | head -n 1
sub-0001/ses-01/anat/sub-0001_ses-01_acq-MPRAGEXp3X08mm_T1w.nii.gz
git annex metadata '**/*.nii.gz'
error: pathspec './**/*.nii.gz' did not match any file(s) known to git
Did you forget to 'git add'?
metadata: 1 failed
# git-annex changed pathspec to have leading ./ -- let's try with that too:
git ls-files './**/*.nii.gz' | head -n 1
sub-0001/ses-01/anat/sub-0001_ses-01_acq-MPRAGEXp3X08mm_T1w.nii.gz
# annex get -- the same story
git annex get '**/*.nii.gz'
error: pathspec './**/*.nii.gz' did not match any file(s) known to git
Did you forget to 'git add'?
(merging typhon/git-annex into git-annex...)
(recording state in git...)
get: 1 failed
```
From `annex --debug` we can see that annex unconditionally uses `--literal-pathspecs`
```shell
git annex --debug get '**/*.nii.gz'
[2024-08-23 21:29:36.951044831] (Utility.Process) process [3889124] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","./**/*.nii.gz"]
```
so, I think then annex should have at least used "literal" in the error, e.g.
```
error: literal pathspec './**/*.nii.gz' did not match any file(s) known to git
```
and ideally also hinted on how to disable such behavior (if possible) and do allow for "magical" etc pathspecs there.
FWIW, I have tried with `GIT_GLOB_PATHSPECS=1` env var but that didn't help.... not sure if possible at all looking at the code
```
fixupRepo :: Repo -> GitConfig -> IO Repo
fixupRepo r c = do
let r' = disableWildcardExpansion r
r'' <- fixupUnusualRepos r' c
if annexDirect c
then return (fixupDirect r'')
else return r''
{- Disable git's built-in wildcard expansion, which is not wanted
- when using it as plumbing by git-annex. -}
disableWildcardExpansion :: Repo -> Repo
disableWildcardExpansion r = r
{ gitGlobalOpts = gitGlobalOpts r ++ [Param "--literal-pathspecs"] }
```
[[!meta author=yoh]]
[[!tag projects/openneuro]]

View file

@ -0,0 +1,40 @@
### Please describe the problem.
`git annex lookupkey --ref` refuses to run in a bare repository, even though the `--ref` option was added specifically for that use-case.
### What steps will reproduce the problem?
`git annex lookupkey --ref` in a bare repository.
### What version of git-annex are you using? On what operating system?
```
git-annex version: 10.20240732-g1e0f13ad7ffed0d75e3944d6189d984faefdb4af
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.33 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.7 http-client-0.7.14 persistent-sqlite-2.13.2.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
```
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
$ git annex lookupkey --ref 84ec55fbac12fd09a679fca6f94ea98be83356df
git-annex: You cannot run this command in a bare repository.
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="mih"
avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd"
subject="Needed to retrieve single file metadata from bare repo"
date="2024-08-28T13:58:30Z"
content="""
I ran into the same issue. My actual goal is to retrieve git-annex metadata for a specific file from a bare repo. I only know branch/commit and the path. `git-annex metadata` can only report for a tree or a key. For the former I need to implement path matching for a potentially voluminous output. For the latter I need to look up the key -- which currently is not supported for a bare repo.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de"
subject="comment 2"
date="2024-08-28T14:11:36Z"
content="""
@mih if you need a workaround now, you can parse the key from `git show <branch/commit/tag>:<path>` or even just `git show <blob-id-of-the-file>`. In the case of locked files, it will return something like `.git/annex/objects/.../.../<key>/<key>` (i.e. the symlink target), and in the case of unlocked files it is something like `/annex/objects/<key>`. This is what forgejo-aneksajo does here: <https://codeberg.org/matrss/forgejo-aneksajo/src/branch/forgejo/modules/annex/annex.go#L48-L105>. `lookupkey --ref` would massively simplify that code though.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2024-08-30T14:47:41Z"
content="""
Fixed that.
It kind of seems like metadata could have an option to get the metadata for
a specific ref as well, but since it already has --branch which takes a
branch ref, adding a --ref which takes a file ref seems confusing. Maybe
--fileref? There are a decent number of other commands that also use
parseKeyOptions to support --branch/--key/--all that would also get the new
option if it were implemented.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="bvaa"
avatar="http://cdn.libravatar.org/avatar/1c36fa5fed5065f59842ebce35b10299"
subject="comment 2"
date="2024-08-14T07:18:25Z"
content="""
I have the same issue but with an external special remote that claims some https: URL based on a specific domain name.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="Most servers upgraded to TLS v1.2 EMS / TLS v1.3"
date="2024-08-13T00:01:05Z"
content="""
For the record, this problem has largely \"solved itself\" by waiting -- one of the podcast feeds I was having problems with upgraded to a new server (ie, off CentOS 7 I think), which supports TLS v1.3, so is no longer a problem. And around the end of support for CentOS 7 (~ June 2024) the other problem server also stopped being a problem, I'm guessing due to deploying new media servers.
Among other things this means I no longer have a good test case to suggest for testing this problem.
So possibly the problem of \"TLS v1.2 EMS now required\" can be ignored, because at this point it should largely be very old (unmaintained) server installs that still cannot do TLS v1.2 EMS or TLS v1.3
Ewen
"""]]

View file

@ -0,0 +1,49 @@
# Committing File Contents to Git: Unlock Confusion
I cannot convert a file from being annexed to its content being committed to git. Instead, annex commits a pointer to git as if the file were to be unlocked. This is regardless of if the key exists in `git/annex/objects`. There is no workaround it seems. At this point I have an annexed file committed to a repo. If I want to go back and commit the file contents to git instead, I tried the workaround of committing the deletion after running `git annex unannex` and then committing the file again via `git commit`. However, this still only commits the pointer contents to git as shown by `git annex HASH`. What's worse, the HASH - found from `git log --raw` is the same hash that can be gotten from `git hash-object FILE`. So it looks like the file content committed correctly but it's not.
It appears to be that the hook hashes the file content, and if that content has ever been logged in the git-annex branch logs, it assumes the user just unlocked the file. I would hope that what is shown in `git log --raw` is in fact representative of the *content* saved to the git repo. I would assert then that annex should commit a git object with a hash for a pointer file that is **different** than for the file contents. So, if I have a "unlocked" pointer file of contents `/annex/objects/MD5E-s87104--942e5878169ea672dc8ab47889694974.txt` the object should be `6a/0da5de8f1a16a30b713b180972dadacb1edd7a`. Then if I manually hash-object the file and see `80d6030a72be1bb60644df613b1597793263a8d5` (the hash of the actual contents in my case) I can see that this content is in fact NOT within my git history yet.
I notice that when I truly unlock a file, because I have (by default) `annex.thin=false`, the file content moves out of the annex on unlock, but *folder structure remains*. This is in contrast to unannex where the emptied `annex/objects/` tree is deleted. Maybe the hook checks for the existence of empty folders in the annex as a signal of unlock versus unannex? More trivially, if `annex.thin=true`, then maybe the inode count can indicate unlocking.
In case this is platform dependent here is my info:
```
git-annex version: 10.20240701
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.2 bloomfilter-2.0.1.2 crypton-1.0.0 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.3 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external
operating system: darwin aarch64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10
```
Install Details (Brew)
```
==> git-annex: stable 10.20240808 (bottled), HEAD
Manage files with git without checking in file contents
https://git-annex.branchable.com/
Installed
/opt/homebrew/Cellar/git-annex/10.20240701 (11 files, 167.2MB) *
Poured from bottle using the formulae.brew.sh API on 2024-07-18 at 13:46:03
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/g/git-annex.rb
License: AGPL-3.0-or-later and BSD-2-Clause and BSD-3-Clause and GPL-2.0-only and GPL-3.0-or-later and MIT
==> Dependencies
Build: cabal-install ✘, ghc@9.8 ✘, pkg-config ✔
Required: libmagic ✔
==> Options
--HEAD
Install HEAD version
==> Caveats
To start git-annex now and restart at login:
brew services start git-annex
Or, if you don't want/need a background service you can just run:
/opt/homebrew/opt/git-annex/bin/git-annex assistant --autostart
==> Analytics
install: 542 (30 days), 1,832 (90 days), 6,629 (365 days)
install-on-request: 439 (30 days), 1,574 (90 days), 5,735 (365 days)
build-error: 0 (30 days)
```

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="Spencer"
avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55"
subject="Exact Moment Things Go Wrong"
date="2024-08-13T06:22:11Z"
content="""
Hopefully this specific issue can be reproduced:
1. Have a repo with an annexed file committed.
2. Run `git annex unannex` on the locked file.
3. Run `git commit` to save the file as deleted on the index.
4. Drop the file contents in git annex (useful to have a remote with contents so you don't have to --force) by key (`git annex drop --key KEY`)
4a. Has to be done by key because `git annex unused` does NOT show the key as unused.
4b. Instead, `git annex whereused --key KEY --historical` should show `[here] branch~X:path/to/file` i.e. it's used X commits prior to the head `branch`
5. `git annex findkeys` to see key not there.
6. `git add FILE`
7. Key now back in annex, e.g. under `findkeys`.
7a. At this point, dropping the file contents appears to change the file size in `ls -Al`: a tiny (tens of bytes) file tells you that it's really a pointer file.
8. Never during this process will `ls -Al` show any indication that the file isn't a normal file after unannexing. inode = 1, no symlink. Just the file size changes if the contents aren't in the annex.
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="Spencer"
avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55"
subject="Workaround: --force-small"
date="2024-08-13T07:05:57Z"
content="""
One workaround I've (finally) found is `git annex add --force-small` instead of `git add`. This **forces** annex to add the content to git. Phew!
What's even more interesting is that all along, `git hash-object` has been hashing the contents of the pointer file without me even knowing it. On my system when a file is a pointer file and I have the file contents in my annex:
- `ls -l` shows the file content size. Dropping the file from the annex changes this number to the pointer file string size (tens of bytes).
- `git hash-object FILE` hashes the **pointer file contents**. Reproduce the hash via `git cat-file -p :/path/to/FILE | git hash-object --stdin`. Trying `echo \"pointer\" | git hash-object --stdin` won't work with or without spaces. Also, I can `cat <file> | git hash-object --stdin` to see the real hash of the file contents.
In summary, *annex is committing what I want*: the hash of the actual contents stored in git. `hash-object`, annex, and git are somehow recognizing the file as a pointer file where `ls` cannot. I assume this is done by annex behind the scenes, which fascinates me because `git hash-object` otherwise isn't affected by repositories and can be run anywhere on any file.
Going forward - for others who run into this issue - you can use `git annex add --force-small` to overcome this confusion with unlock.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="Spencer"
avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55"
subject="Precise Workflow"
date="2024-08-22T00:18:27Z"
content="""
To be more precise on how to accomplish this - say for synchronizing special remotes for repos that are otherwise completely different - one might consider:
1. `git push destination git-annex:synced/git-annex`. This is what git-annex does under the hood during the `push` step of the `sync`.
1. In `destination`, run `git annex merge`. This performs the merging of `synced/git-annex` into `git-annex`.
I found this useful when I was trying to set up multiple repositories to use one central location (an rclone special remote) for file content sharing. Since the repos had a shared context (a project), but were otherwise disjoint from one another, `sync` was not an option. However, I felt odd running `git annex initremote` for each repo separately because then I could end up with myriad special remotes with the same configuration but different UUIDs for each. Ultimately this is not a problem - to have the same special remote have different UUIDs in different repositories - so long as the repos **never** come in contact. But I, novice as I was, had already muddled the git-annex branches of these repos together already, so for sake of cleanliness I went back and reimplemented these special remotes as the same UUID on every repo. This often involved adding repos as remotes to one another, fetching - which implicitly performs some merging - and then pushing (as above) any metadata changes to the repo, leaving content changes untouched.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="Spencer"
avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55"
subject="Still a Problem (on Mac?)"
date="2024-08-13T04:21:32Z"
content="""
This is still occurring for me. If I unannex, then commit the deletion, drop the key, and add back the file now with `git add`, annex usurps the commit and commits only a pointer. Then, `git annex find --unlocked` shows the file as an unlocked annexed file.
`git show HASH` shows the pointer, not the file contents, thus my worry that the contents are lost in the git repo. What's worse, `git hash-object` gives the same hash as shown in `git log --raw` so by plain inspection of the log it seems like the content is properly logged in the git repo but it's not!
It appears the bug you mentioned has been closed/deleted (instead of moved to done). I am going to reopen it and put in my details.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="pedro-lopes-de-azevedo"
avatar="http://cdn.libravatar.org/avatar/492b7020bff4e7cb466e95dfd72fd206"
subject="parameter --from not accepted"
date="2024-08-14T14:27:53Z"
content="""
Recently I tested the export command adding the `--from` parameter and it was not accepted.
git-annex version: 10.20240701
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="Spencer"
avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55"
subject="Remote Helper?"
date="2024-08-17T05:33:01Z"
content="""
Homebrew doesn't seem to install the remote helper (`git remote-annex` is not a known command).
Building from source doesn't work because brew installs base>4.20 which is incompatible with filepath-bytestring. Since homebrew is against backward compatibility I presume changing base version by installing a different ghc is out of the question.
Maybe there's a way to do this with sandboxes? I'm not familiar with haskell, can anyone update the build recipe on how to build git-annex on MacOS (Apple silicon)? As I understand it one would need:
1. `brew install ghc cabal-install haskell-stack` instead of `haskell-platform`
1. option `--bindir -> --installdir`
1. To specify `extra-lib-dirs` and `extra-include-dirs` to `/opt/homebrew/(lib|include)` respectively in cabal config or as additional options
1. `base` version `< 4.20` must be installed when installing `ghc`
This is where I got stuck because I can't reinstall `base` without understanding sandboxes or installing a different GHC version (I think? This is effectively my first exposure to haskell)
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="gauss@055c9051f507c97fa5612f46c74ce636f5ecde10"
nickname="gauss"
avatar="http://cdn.libravatar.org/avatar/07c3a0c551ecfe4aa8c047ff5f6f4e79"
subject="No root privileges server - annex-shell replaced by git-annex-shell"
date="2024-08-23T01:51:49Z"
content="""
I've cloned a git repository through ssh from a server which I don't have root privileges.
The clone command is something like:
git clone ssh://johndoe@somedomain.com:23/home/johndoe/Downloads/gitannextest4/
I tried to enable the remote and I get the error: *Remote gitannextest4 does not have git-annex installed; setting annex-ignore*.
I had no success following the steps here.
I believe there is an error in the last of the alternatives presented here:
git config remote.annoyingserver.annex-shell /home/me/bin/git-annex-shell (does not work)
git config remote.annoyingserver.git-annex-shell /home/me/bin/git-annex-shell (works!)
So, **annex-shell** should be replaced by **git-annex-shell**.
Hope it helps.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="Matthew"
avatar="http://cdn.libravatar.org/avatar/495960d189a9cdc26f8e449bbf28aaf4"
subject="Help with .nfsXXXX files"
date="2024-08-19T21:20:58Z"
content="""
I have many dozens of .nfs files that I cannot seem to remove. I have had IT reboot the machine I was using with git-annex, as well as the file server in hopes of killing the process that have the files open. The files stubbornly remain, and cannot be removed with 'rm -f .nfsXXXX' with resulting \"rm: cannot remove .nfsXXXX: Permission denied\", even after the reboots.
Any thoughts are appreciated, as I have a few hundred gigabytes tied up in these files.
My next step is to see about working with IT to put the file server in single-user mode, and getting root access to see if we can remove the files. But, I'm hoping maybe there are some other suggestion before taking such a drastic step.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2024-08-29T18:35:47Z"
content="""
just ran into this again with `datalad push` which surprised me (since I do not get into it with regular `git push`), and took me a bit to figure out/find this issue.
> Unless there's some reason why you need git to pull from the http url rather than from the ssh url?
It is my pattern of working with git -- clone via public URL whenever possible (so I do not have to load/use any ssh key without necessity; could use the same URLs on public and private hosts alike) and only when needed to push, automagically push via ssh. FWIW I really love such workflow and use it not only for github but other hosting providers too!
And IMHO indeed it would make total sense for a similar separation of \"use public public/read access route regardless of having or not credentials for private/write, and use secure/authenticated route only if write/push is necessary\" for git-annex too. The utility of `insteadOf` is not allowing for such separation, but at least indeed would allow \"location-wide\" overload of using secure/authenticated even when simpler public access route possible.
Indeed adding such a feature parity with `git` might break existing setups, but I would say it should only fix a possible divergence and remove the surprise that annex is behaving differently from how git does it. IMHO it is unlikely someone had `pushInsteadOf` configured to have `git` push somewhere else (thus git-annex branch going there too) while still somehow interested to use original URL for git-annex.
"""]]

View file

@ -0,0 +1,8 @@
It is desired to be able to get keys which correspond for some commit which would otherwise be not easy/undesired to checkout (too big tree, tree used actively etc) .
But it seems it is impossible to do so ATM:
```shell
yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex get --branch e6888f70ed97099f83a77d5bcf3372a9a75a2b5e^ '**/*.nii.gz'
git-annex: Can only specify one of file names, --all, --branch, --unused, --failed, --key, or --incomplete
```

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de"
subject="comment 1"
date="2024-08-28T08:47:32Z"
content="""
While a migration to VURL by default would be great, I think this issue when dealing with external special remotes should first be fixed: <https://git-annex.branchable.com/bugs/VURL_verification_failure_on_first_download/>. Right now, my datalad-cds extension does not explicitly set the URL backend, so it would break if the default was to change to VURL, but I would really like to use VURL with it if it was possible.
"""]]

15
doc/users/Spencer.mdwn Normal file
View file

@ -0,0 +1,15 @@
---
## Contributions
### Contributed Pages
[[!map pages="author(Spencer) and !internal(recentchanges/*) and !comment(*)"
show="title"
sort="title"]]
### Comments
[[!map pages="author(Spencer) and comment(*)"
show="title"
sort="title"]]