From 925c203c0939b2cb542c4cdab38d5aaa0e23b310 Mon Sep 17 00:00:00 2001 From: "lucas.gautheron@f2b5c93a64b028c1ec8698b9c2412ed51ff22040" Date: Mon, 2 Sep 2024 15:08:25 +0000 Subject: [PATCH 1/2] --- doc/forum/Copy_portion_of_file_from_remote.mdwn | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 doc/forum/Copy_portion_of_file_from_remote.mdwn diff --git a/doc/forum/Copy_portion_of_file_from_remote.mdwn b/doc/forum/Copy_portion_of_file_from_remote.mdwn new file mode 100644 index 0000000000..4701835e6b --- /dev/null +++ b/doc/forum/Copy_portion_of_file_from_remote.mdwn @@ -0,0 +1,15 @@ +Hi, + +My peers and I work with longform audio recordings (10-20 hours each). +We often need to sub-sample small portions (typically 1 percent) of many of these recordings in unpredictable ways, for various reasons. +Unfortunately, to do so, we must download entire recordings of ~1GB each, even if we end up using only 1% of each of them. +This can take hours. + +My question is: how hard would it be to download specific ranges of bytes from a remote repository (given start/end cursors)? +Given that git annex can resume interrupted downloads, I assume there is already some code for readings bytes from a remote, starting a specific position in the file. +What would be the easiest way of doing this? Is it achievable via git annex' interface? Or would it require a change in git annex itself? +(My colleagues and I aren't proficient in Haskell and we can't really maintain binaries for the multiple platforms we work with). + +If this requires a patch to git-annex, maybe this would be a feature of interest to a broader share of people? + +Best, From 850ea3a9b850e9bb5d7a838203a95b0eb462fbdc Mon Sep 17 00:00:00 2001 From: "lucas.gautheron@f2b5c93a64b028c1ec8698b9c2412ed51ff22040" Date: Mon, 2 Sep 2024 15:12:02 +0000 Subject: [PATCH 2/2] --- doc/forum/Copy_portion_of_file_from_remote.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/forum/Copy_portion_of_file_from_remote.mdwn b/doc/forum/Copy_portion_of_file_from_remote.mdwn index 4701835e6b..60470a3f90 100644 --- a/doc/forum/Copy_portion_of_file_from_remote.mdwn +++ b/doc/forum/Copy_portion_of_file_from_remote.mdwn @@ -6,7 +6,7 @@ Unfortunately, to do so, we must download entire recordings of ~1GB each, even i This can take hours. My question is: how hard would it be to download specific ranges of bytes from a remote repository (given start/end cursors)? -Given that git annex can resume interrupted downloads, I assume there is already some code for readings bytes from a remote, starting a specific position in the file. +Given that git annex can resume interrupted downloads, I assume there is already some code for readings bytes from a remote, starting from specific positions in a file. What would be the easiest way of doing this? Is it achievable via git annex' interface? Or would it require a change in git annex itself? (My colleagues and I aren't proficient in Haskell and we can't really maintain binaries for the multiple platforms we work with).