From ebaedbd1753fcb3d34ddaec9f1d04a1f70fcdf7c Mon Sep 17 00:00:00 2001 From: NorsePaladin Date: Sat, 7 Jun 2025 03:21:24 +0000 Subject: [PATCH 1/6] Special remote protocol: How to identify exact size of a particular key? --- ...ecial_remote_protocol__58___How_to_identify_exactsize.mdwn | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn diff --git a/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn new file mode 100644 index 0000000000..472d69d1f9 --- /dev/null +++ b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn @@ -0,0 +1,4 @@ +I'm trying to write a special remote protocol in which it would be really helpful to have the exact size for a particular key. I was thinking of something like the special remote asking git-annex `GETKEYINFO ` and git annex responds with some useful info (Something like a dictionary of useful values maybe?) + +I considered doing something like `git annex info ..` to figure this out but realized it's a bad idea(That'll be very brittle, plus it won't work well with chunked/encrypted remotes at all). Does git annex typically have this info available? It would even be helpful if it only gives responses in specific cases (eg: no encryption since it'll presumably be hard to keep track of that case) + From f361e0ef4b4c925af15b14be0effa65733f44f2d Mon Sep 17 00:00:00 2001 From: nobodyinperson Date: Sat, 7 Jun 2025 09:39:28 +0000 Subject: [PATCH 2/6] =?UTF-8?q?Added=20a=20comment:=20Now=20the=20current?= =?UTF-8?q?=20branch=20is=20pushed=20first!=20=F0=9F=A5=B3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ..._3f06a7f454747ec9359dd4c45b12a563._comment | 57 +++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment new file mode 100644 index 0000000000..4910adf894 --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment @@ -0,0 +1,57 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Now the current branch is pushed first! 🥳" + date="2025-06-07T09:39:27Z" + content=""" +Thank you very much joey, I can confirm that the current branch is now pushed first and thus used as the default branch of the newly created repo: + +## New version + +[[!format bash \"\"\" +$ git annex version --raw +10.20250605-gb9e3cf8780a04c8b1ac0cf4768c9ec510483477c$ +$ git init repo +Initialized empty Git repository in /home/yann/Downloads/git-annex.linux/repo/.git/ +$ cd repo +$ git annex init +init ok +(recording state in git...) +$ git remote add homelab ssh://.../yann/testrepo +$ touch bla +$ git annex assist +add bla ok +(recording state in git...) +commit (recording state in git...) +ok +pull homelab ok +push homelab ok +$ git remote show homelab | grep HEAD + HEAD branch: main ✅✅✅✅✅✅✅✅✅✅✅✅✅ +\"\"\"]] + +## Old version + +[[!format bash \"\"\" +🐟 ❯ git annex version --raw +10.20250521-g1a9e6bf26b56c39429d4a096bf733e57e5684e1b +🐟 ❯ git init repo2 +Leeres Git-Repository in /home/yann/Downloads/repo2/.git/ initialisiert +🐟 ❯ cd repo2/ +🐟 ❯ git annex init +init ok +(recording state in git...) +🐟 ❯ git remote add homelab ssh://.../yann/testrepo2 +🐟 ❯ touch bla +🐟 ❯ git annex assist +add bla ok +(recording state in git...) +commit (recording state in git...) +ok +pull homelab ok +push homelab ok +🐟 ❯ LC_ALL=C.UTF-8 git remote show homelab | grep HEAD + HEAD branch: synced/main ⚠️⚠️⚠️⚠️⚠️ +\"\"\"]] + +"""]] From ad7a880fc166051267202f870896f8f27d6999c8 Mon Sep 17 00:00:00 2001 From: Spencer Date: Sun, 8 Jun 2025 16:37:47 +0000 Subject: [PATCH 3/6] --- doc/forum/Import_-_Changing_Largefiles.mdwn | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 doc/forum/Import_-_Changing_Largefiles.mdwn diff --git a/doc/forum/Import_-_Changing_Largefiles.mdwn b/doc/forum/Import_-_Changing_Largefiles.mdwn new file mode 100644 index 0000000000..a49fe3d970 --- /dev/null +++ b/doc/forum/Import_-_Changing_Largefiles.mdwn @@ -0,0 +1,20 @@ +# Changing Largefile Specification for Imported Trees + +If you want files to be large/small *after* already importing a tree from an `importtree` enabled remote, well, it appears you can't. + +I tried removing the imported branch via `git branch -d --remote /`. +While this produces a new clean import commit upon running `import` again, it does *not* respect changes to `.gitattributes`. +Instead, `git-annex` seems to hold onto information about which files were large/small in a given special remote. +So, the only way to change what are considered large files and small files is to create a new special remote entirely :/ + +For most people, this should not be too problematic since the history of imported trees isn't too important, but for some diffs on an external tree may be valuable. +Is there any interest in addressing this issue? +For a better understanding, here is a MWE to reproduce this: + +1. Create an `importtree` enabled special remote for a fresh repo without a `.gitattributes` file (or at least one without `annex.largefiles` attributes) +1. Import (e.g. `gx import -f tree main`) from this tree and note that all files are considered large (e.g. `git log --raw tree/main` -> `git show `) +1. Modify/create a local `.gitattributes` file (and add it to the index) that would specify one of the tree files as small (i.e. `annex.largefiles` does *not* match) +1. Attempt new import, or do `git branch -d --remote tree/main` and perform new import. +1. Note that all files are still considered large. + +Maybe there's another way of fixing this that I don't know about, but as far as I know, from this point you have to delete the special remote and redo the above now with the desired `.gitattributes` file staged for files in this external tree to be imported as small. From 9fe60062a38228594ce8d48bbe1b14532934f22d Mon Sep 17 00:00:00 2001 From: Spencer Date: Sun, 8 Jun 2025 16:43:11 +0000 Subject: [PATCH 4/6] Linked to discussion on caveat --- doc/git-annex-import.mdwn | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index e78fa0ac14..a0ecc9fa08 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -229,10 +229,12 @@ link, and that symbolic link will be followed. Note that using `--deduplicate` or `--clean-duplicates` with the WORM backend does not look at file content, but filename and mtime. -If annex.largefiles is configured, and does not match a file, `git annex -import` will add the non-large file directly to the git repository, +If `annex.largefiles` is configured (in the current repo's `.gitattributes` file), +and does not match a file, `git annex import` will add the non-large file directly to the git repository, instead of to the annex. +[Caveat Discussion: Adjusting Largefiles Specification](forum/Import_-_Changing_Largefiles) + # SEE ALSO [[git-annex]](1) From 3aa281d185f3653a4e453cf730ee6ca6b5a8e9e8 Mon Sep 17 00:00:00 2001 From: Spencer Date: Sun, 8 Jun 2025 16:48:06 +0000 Subject: [PATCH 5/6] For one: why would preview show a nonexistent page as an existent link instead of a question mark? For two: why is []() syntax relative to current page but [[|]] syntax is relative to root? --- doc/git-annex-import.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index a0ecc9fa08..71b7dedbb4 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -233,7 +233,7 @@ If `annex.largefiles` is configured (in the current repo's `.gitattributes` file and does not match a file, `git annex import` will add the non-large file directly to the git repository, instead of to the annex. -[Caveat Discussion: Adjusting Largefiles Specification](forum/Import_-_Changing_Largefiles) +[[Caveat Discussion: Adjusting Largefiles Specification|forum/Import_-_Changing_Largefiles]] # SEE ALSO From c2b079a89341c92c1b175cae6f3dbe009dd29b21 Mon Sep 17 00:00:00 2001 From: "beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" Date: Mon, 9 Jun 2025 13:09:25 +0000 Subject: [PATCH 6/6] Added a comment --- ..._0785f4683f0e7f9848aced2357ca1ec0._comment | 58 +++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment diff --git a/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment b/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment new file mode 100644 index 0000000000..8606110243 --- /dev/null +++ b/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment @@ -0,0 +1,58 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 6" + date="2025-06-09T13:09:25Z" + content=""" +I'm getting acquainted with this special remote. I cannot praise it enough. It is brilliant. + +This is my first cut git-annex-compute-stripexif: + +[[!format bash \"\"\" +#!/bin/bash + +set -e + +if [ -z \"$1\" ]; then + echo \"Specify the input image file, followed by the output image file.\" >&2 + echo \"Example: foo.jpg foo.gif\" >&2 + exit 1 +fi + +echo REPRODUCIBLE +echo \"INPUT $1\" +read input + +if [ -n \"$input\" ]; then + tf=$(mktemp) + cp \"$input\" \"$tf\" >&2 + exiftool -overwrite_original -ALL= \"$tf\" >&2 + outfile=\"SANSEXIF-\"$(git-annex calckey \"$tf\") +fi +echo \"OUTPUT $outfile\" +read output + +cp -v \"$tf\" \"$outfile\" >&2 +rm -v \"$tf\" >&2 +\"\"\"]] + +Along the way, I've learnt that EXIF metadata isn't the only metadata stored in a jpeg, so the name is now a bit of a misnomer. Also, as it was more proof-of-concept, the target name and location is not well thought out, and there's no preservation of file extension. It's indicative for now. + +The aim is to aid (only) in the identifying two copies of the same jpeg, where only the metadata has been changed (eg. either by adjustments I made by script eons ago, or by apps like Microsoft photoviewer where orientation changes were made via metadata). I say aid only, because it's not going to help if the image is resized, etc. and I understand that. + +To that end, I do have some questions. The first is... is it wise (or possible) to try to set metadata on the source files whilst in the script? (since writing this, I have come to understand that the compute script is not run within the working directory, and the implication is that you're not meant to run any git-annex commands) + +Obviously, the idea would be to tag the source file with the computed key. I have already verified that if two copies of a jpeg that differ only by metadata, the computed file and key will be the same. + +But what I found is, if I don't have that option to set metadata, then respectfully, git-annex-findcomputed may have some deficiencies. + +From what I can gather, git-annex-findcomputed will not list the subsequent input file that when added, computes it. Only the first one. + +So trying to post process the computed files to perform the setting of metadata on the source files would likely not work. + +Also, I was curious about what happens if the input file moves within the archive? I haven't tried... but from what I can see, you wouldn't be able to backtrack from the computed file, because you won't know the key of the input file, in turn to go searching for it (eg. git-annex-whereused). + +Is my use case way off base as to why you should use the compute remote? + +"""]]