From ba561159e13614568aefe3d83db47e2f379ff9a0 Mon Sep 17 00:00:00 2001 From: Spencer Date: Thu, 19 Jun 2025 01:34:17 +0000 Subject: [PATCH] Added a comment: A (Mildly) Compelling Reason --- ..._743a6d9f8f4061f429a844024ba1208f._comment | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment diff --git a/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment b/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment new file mode 100644 index 0000000000..e9887c02c4 --- /dev/null +++ b/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="A (Mildly) Compelling Reason" + date="2025-06-19T01:34:17Z" + content=""" +This feature would alleviate one problem I have with annex in that the path stored in annex symlinks depends on the tree a file sits in. +This makes each *`git`* object of a annexed file in a different folder unique. +If annexed files ever move, we now have a fairly useless new git object introduced into the repo. +Not at all a problem for one file but if you have tens of thousands of annexed files and you refactor, you start to notice that. + +Unlocked files don't have this problem because their blobs point agnostically to the annex and key. +But, of course, unlocking large amounts of files mean content copies so that's not great. + +Symlink chains alleviate this because if I have a chain like `.root -> ./` in the root and `.root -> ../.root` in essentially every directory, then annex symlinks become agnostic too. +And on the git side, that's two new objects to add, and only a new tree object when performing a move. + +Again this is only relevant when the number of files becomes massive. +For sense of scale, let's assume a symlink payload is on the order of 100 bytes. +So 10,000 files generates roughly a Mb of git objects, meaning if I had 100,000 files and moved them around once, I'd have 20 Mb of data dedicated to locating these files w/ 10 Mb of what I would deem as waste. +Honestly, annex and git slow down appreciably at that scale for other reasons (pull/push/checkout, especially on slower file systems), so I say this is a non-issue by comparison. +For those who had similar concerns, there's your benchmark: 10Mb of bloat per 10,000 files per move! +"""]]