diff --git a/doc/bugs/Invalid_option___96__--time-limit__61__1m__39___with_pull.mdwn b/doc/bugs/Invalid_option___96__--time-limit__61__1m__39___with_pull.mdwn new file mode 100644 index 0000000000..42f42f366a --- /dev/null +++ b/doc/bugs/Invalid_option___96__--time-limit__61__1m__39___with_pull.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. + +git annex pull reports `--time-limit=1m` as an invalid option, even though its manpage states that it can use the options from git-annex-common-options and the manpage for those includes the --time-limit option. + + +### What steps will reproduce the problem? + +git annex pull while specifying the --time-limit option. + + +### What version of git-annex are you using? On what operating system? + +[[!format sh """ +$ git annex version +git-annex version: 10.20240227 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.4 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +"""]] + +On Ubuntu, but git-annex is installed from a recent nixpkgs. + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + +$ git annex pull --time-limit=1m +Invalid option `--time-limit=1m' + +Usage: git-annex COMMAND +[...] + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I am currently working on a web interface that lets non-git-annex users request files to be available in a specific repository (which will be located on the storage cluster of a HPC system). A combination of git annex metadata and an appropriate required content expression should make this essentially trivial, which is nice. The time-limit option would be helpful for the background worker doing all the fetching, though. diff --git a/doc/bugs/assistant___40__webapp__41___commited_unlocked_link_to_annex.mdwn b/doc/bugs/assistant___40__webapp__41___commited_unlocked_link_to_annex.mdwn new file mode 100644 index 0000000000..1b084030c1 --- /dev/null +++ b/doc/bugs/assistant___40__webapp__41___commited_unlocked_link_to_annex.mdwn @@ -0,0 +1,106 @@ +### Please describe the problem. + +Today I noticed odd commits happening such as + +``` +❯ git show 4a157861f3d27a40b38ae441dfe306e45e448c66 +commit 4a157861f3d27a40b38ae441dfe306e45e448c66 +Author: ReproStim User +Date: Wed Apr 17 09:22:04 2024 -0400 + + git-annex in reprostim@reproiner:/data/reprostim + +diff --git a/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log b/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +index fc930f54..92b79020 100644 +--- a/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log ++++ b/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +@@ -1 +1 @@ +-/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log ++/annex/objects/MD5E-s69--08983cc11522233e5d4815e4ef62275a.mkv.log +``` + +-- today is April but commits are for files in March... + +There is `git annex webapp` running which is configured to offload all content to another host. + +And actual patch shows that it pretty much annexed the "unlocked link" file after the file was offloaded to remote host. + + +Do not have a minimal reproducer yet, but I think it happened while + +- I had initially .log files which are text going to git +- then I added to `.gitattributes` + +``` +*.log annex.largefiles=anything +``` + +but it was never committed (? I assumed that annex webapp/assistant would do that -- it didn't) -- only now I did that. +- not sure how this morning was special... + +The most interesting is that if I `annex get` -- I do get correct file... + +It is like an inception!!! + +On the fresh clone, if I look inside that file I see short key: + +``` +❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +/annex/objects/MD5E-s69--08983cc11522233e5d4815e4ef62275a.mkv.log +``` + +then, if I `annex get` it -- I get content with long key + +```shell +❯ git annex get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log (from rolando...) +ok +(recording state in git...) +❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log +``` + +then upon subsequent get -- I will get the actual content: + +```shell +❯ git annex get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log (from rolando...) +ok +(recording state in git...) +❯ head -n 1 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +2024-03-17 14:09:12.551 [info] [685899] Session logging begin : reprostim-videocapture 1.5.0.119, session_logger_2024.03.17.14.09.12.550, start_ts=2024.03.17.14.09.12.550 +``` +and dropping it would lead me just to the "long key" + +``` +❯ git annex drop 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +drop 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log (locking rolando...) ok +(recording state in git...) +❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log +``` + +and will not be able to come out into reality from the 2nd level of inception: + +``` +❯ git annex drop 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log +/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log +``` + + +### What version of git-annex are you using? On what operating system? + +on original server with webapp: 10.20240227-1~ndall+1 + +on intermediate server through which transfer of files happens: I think it might be old + +``` +[bids@rolando VIDS] > git annex version +git-annex version: 6.20180808-ga1327779a +``` + +on laptop where I dive into inception: 10.20240129 + +[[!meta author=yoh]] +[[!tag projects/repronim]] diff --git a/doc/bugs/test__58___posix__95__spawnp_broken_10_on_darwin/comment_2_3fb2b47ade727a1bc6a99120b68d98b7._comment b/doc/bugs/test__58___posix__95__spawnp_broken_10_on_darwin/comment_2_3fb2b47ade727a1bc6a99120b68d98b7._comment new file mode 100644 index 0000000000..de30ce0b65 --- /dev/null +++ b/doc/bugs/test__58___posix__95__spawnp_broken_10_on_darwin/comment_2_3fb2b47ade727a1bc6a99120b68d98b7._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Happens on nix on MacOS as well" + date="2024-04-13T15:51:54Z" + content=""" +I just had the same problem. An `addurl` test failing while building on MacOS with `nix-shell -p git-annex`. I recorded a video of it coincidentally. I don't know what's with the `security` program it tries to call. I was using a MacOS VM via [Docker-OSX](https://github.com/sickcodes/Docker-OSX?tab=readme-ov-file#big-sur-). +"""]] diff --git a/doc/forum/Alternative_modes_for_annex_repos.mdwn b/doc/forum/Alternative_modes_for_annex_repos.mdwn new file mode 100644 index 0000000000..6d707d3b14 --- /dev/null +++ b/doc/forum/Alternative_modes_for_annex_repos.mdwn @@ -0,0 +1,35 @@ +I have given a lot of thought and experimentation to how git-annex could be used for large projects where there is a desire to distribute files to many users, but where only a minority of users would would actually push key changes. + +The first option would be to have an annexless mode, where a local repo either has no uuid, or where the git-annex branch is not stored in the default namespace. + +This is for cases where the client only cares that a file exists in the repo and that it has been verified. + +One possibility could be `git-annex-get --no-init`, which would not init a local repo, but would get and verify a file. The existence of a file would simply be if the file exists. Only upon making a change can a could be fully inited. + +Or even better, in a restricted environment where git-annex is not available, this case is simple enough that getting a key from a url could be done with a shellscript. The url could be extracted from the upstream git-annex branch without checking it out, and the symlinks used for verification. However, there is a chance that the upstream git-annex branch may not be stable (like if it is not propagated after a mirror), so one could "shrinkwrap" keys and store their remote url locations in a .gitattributes file or somewhere else in the same branch. If key changes are desired, it can be fairly effortlessly upgraded to an actual git-annex repo. + +A step up from a completely annexless repo would be a hypothetical local-only git-annex repo, where git-annex only uses a git-annex branch locally. + +There could be a `git-annex-init --local` option which creates a `local/git-annex` branch, for local tracking, but would not sync to the server by default. + +In this mode, the upstream git-annex branch would just be pulled and kept read-only, and `local/git-annex` would keep local differences. The `local/git-annex` would just use the union driver to combine upstream changes with local changes. Upgrading to a full git-annex repo would be as easy as creating a new `git-annex` branch at the same commit id as `local/git-annex` + + +So, in summary, I have considered two modes: + +Fully annex-less mode, which is simple enough to be implemented completely without git-annex, useful for restricted environments. Optionally, can use a kind of shinkwrapping to externalize key URLs to a file in the branch to guarantee that the fetch location is stored. + +Secondly is local mode, where a `local/git-annex` branch is downstream from a git-annex branch, and in order to sync changes back to the server, the repo is upgraded. + +Both of these modes could easily be upgraded to a full git-annex repo on demand. + +I think this is useful when considering large scale usage. + + +Most of this functionality is something that is probably best suited for a wrapper. + +In terms of any any potential core changes to git-annex, it may be as simple as having a GIT_ANNEX_BRANCH environment variable, analogous to the GIT_DIR variable for git. + +Has anyone given any thought to scenarios like this? + +I think there are cases where developers use git-lfs and something like this might be a better fit. And also with making git-annex repos more generally available and portable. diff --git a/doc/forum/How_to_allow_clones_to_get_files_via_URL__63__.mdwn b/doc/forum/How_to_allow_clones_to_get_files_via_URL__63__.mdwn new file mode 100644 index 0000000000..f3978d8123 --- /dev/null +++ b/doc/forum/How_to_allow_clones_to_get_files_via_URL__63__.mdwn @@ -0,0 +1,52 @@ +I'm working with Datalad, but I suspect that my problems stem from not fully understanding how git annex works. + +I’ve been trying a set up a dataset that primarily lives on a web server, but needs to be clone-able by other people. The annex files are visible and downloadable from the server’s website. In particular, the files I’m concerned about here are in a subdataset. + +I used `datalad addurls` to add the URL of each file on the server to each file in the annex. When I run `git annex whereis filename`, it shows up that it lives on the server in the server’s local copy of the dataset, and that it lives on the web, with a correct URL. In fact, if I click on that URL and open it in a browser, it downloads my file. + +The dataset lives on Github, but the annex does not. When I make a clone of the superdataset on my personal computer, I get messages like + +``` +[INFO ] Unable to parse git config from origin +[INFO ] Remote origin does not have git-annex installed; setting annex-ignore +| This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin +install(ok): /home/erin/Documents/DHA/carcas (dataset) +``` + +Then when I'm in the dataset `carcas-models` that has the annex and I run `datalad get models +/Alpaca\ 3rd\ Carpal\ L.glb`, I get this error message: + +``` +get(error): models/Alpaca 3rd Carpal L.glb (file) [no known url +no known url +no known url] +``` + +I suspect my problem is with how I set things up with git annex, because when I try `git annex get models/Alpaca\ 3rd\ Carpal\ L.glb`, I get the error: + +``` +get models/Alpaca 3rd Carpal L.glb (from web...) + no known url + + Unable to access these remotes: web + + Maybe add some of these git remotes (git remote add ...): + 095e299d-037e-4172-87e0-bbd7183a6613 -- CARCAS models on the 3dviewers server + + (Note that these git remotes have annex-ignore set: origin) +failed +get: 1 failed +``` + +I'm confused on how to debug this because when I run git annex whereis models/Alpaca\ 3rd\ Carpal\ L.glb, everything looks correct: + +``` +whereis models/Alpaca 3rd Carpal L.glb (2 copies) + 00000000-0000-0000-0000-000000000001 -- web + 095e299d-037e-4172-87e0-bbd7183a6613 -- CARCAS models on the 3dviewers server + + web: https://3dviewer.sites.carleton.edu/carcas/carcas-models/models/Alpaca%203rd%20Carpal%20L.glb +ok +``` + +What's the correct way to set up this use case? I don't think that I want the server to be a special remote, because the hidden files like .gitattributes aren't visible. I want to be able to put more files on the server, add their URLS based on where they are on the server, and push to github so that other people can get these files if they want. diff --git a/doc/forum/Using_git-annex_as_a_library/comment_10_cfded5c6325007c3f3f83818bd2e1dbc._comment b/doc/forum/Using_git-annex_as_a_library/comment_10_cfded5c6325007c3f3f83818bd2e1dbc._comment new file mode 100644 index 0000000000..8583952330 --- /dev/null +++ b/doc/forum/Using_git-annex_as_a_library/comment_10_cfded5c6325007c3f3f83818bd2e1dbc._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="unqueued" + avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d" + subject="comment 10" + date="2024-04-10T12:26:49Z" + content=""" +For my two cents, I have found git-annex to be a simple enough format that I have only needed basic helper scripts. + +But many operations can be done with one or a few lines of code. + +Git can do much of the heavy lifting for you in terms of looking stuff up from the git-annex branch, and I find the formats to be quite regular and easy to parse. + +I am thinking of bringing some of this together into a PHP library. + +But maybe I should just post my pure git-annex bash/perl one-liners. +"""]] diff --git a/doc/forum/When_to_reuse_UUIDs_and_avoiding_UUID_clutter.mdwn b/doc/forum/When_to_reuse_UUIDs_and_avoiding_UUID_clutter.mdwn new file mode 100644 index 0000000000..fc5b75ff15 --- /dev/null +++ b/doc/forum/When_to_reuse_UUIDs_and_avoiding_UUID_clutter.mdwn @@ -0,0 +1,15 @@ +I wanted to discuss cases for UUID reuse + +One reason is to mutate a special remote type. For example, a directory special remote to an rsync special remote and vice versa, passing along the uuid argument to initremote. Changing the directory layout is not hard. And you may wish to re-layout your .git/annex/objects directory to a different directory prefix and upload it to a cloud provider. If it supports an rclone remote that has hashing, you can verify it without having to redownload. + +Another good reason is to reuse a uuid is to avoid uuid namespace clutter. +If you know ahead of time that you are storing data in repos that may later be merged, it makes sense to have a template annex repo to base a new repo off of, as well as store common settings and uuids. + +For example, I have a uuid space for multimedia annexes (peertube, youtube, podcasts, etc). + +The template comes preloaded with a uuid.log and remote.log. If my hostname is in the uuid.log, I reinit with that. + +If I must merge unrelated histories with conflicting name/uuid values, I first prefix my names with something. After a merge, I can do some gymnastics to make sure that the proper keys are set present for the respective uuid/name that I have chosen, and make the obsolete uuid/name dead. Simply making them dead is not enough, because even if a special remote uuid is marked dead, if the name is the same, it will still cause a conflict, so prefixing uuid/name collisions is importnat. + +I currently have a several annex template repos for different purposes (disk images, multimedia, etc). I have been meaning to automate this process more. + diff --git a/doc/forum/copying_annex_between_remotes_manually__63__/comment_3_875474a913d23823e3866cca27cfa3ac._comment b/doc/forum/copying_annex_between_remotes_manually__63__/comment_3_875474a913d23823e3866cca27cfa3ac._comment new file mode 100644 index 0000000000..19fff278ac --- /dev/null +++ b/doc/forum/copying_annex_between_remotes_manually__63__/comment_3_875474a913d23823e3866cca27cfa3ac._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="unqueued" + avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d" + subject="comment 3" + date="2024-04-10T12:46:53Z" + content=""" +I would just clone the repo to the new machine, do `git annex init`, and then rsync the contents of `.git/annex/objects`, and then do `git annex fsck --all` to have to recheck every key it knows about. + +Alternatively, if you're concerned that there might be keys that weren't properly recorded somehow, in your new repo, after `.git/annex/objects` has been transferred, you can create an ingestion directory with a flat layout of the copied keys: + +```bash +mkdir ingest && find .git/annex/objects -type f | xargs mv ingest && git annex reinject --known ingest/* +``` + +Finally, if you just want to rebuild it from scratch, do cp with the `-cL` option. If you are on macOS, it will make a reflink copy, and follow the symlinks. Delete the target .git dir and re-create it. +"""]] diff --git a/doc/todo/compute_special_remote.mdwn b/doc/todo/compute_special_remote.mdwn new file mode 100644 index 0000000000..a4c57c0b27 --- /dev/null +++ b/doc/todo/compute_special_remote.mdwn @@ -0,0 +1,62 @@ +# Enable git-annex to provision file content by other means than download + +This idea [goes back many years](https://github.com/datalad/datalad/issues/2850), and has been [iterated on repeatedly afterwards](https://github.com/datalad/datalad-next/issues/143), and most recently at Distribits 2024. +The following is a summary of what role git-annex could play in this functionality. + +The basic idea is to wrap a provision-by-compute process into the standard interface of a git annex remote. +A consumer would (theoretically) not need to worry about how an annex key is provided, they would simply get-annex-get it, whether this leads to a download or a computation. +Moreover, access cost and redundancies could be managed/communicated using established patterns. + +## Use cases + +Here are a few concrete use cases that illustrate why one would want to have functionality like this + +### Generate annex keys (that have never existed) + +This can be useful for leaving instructions how, e.g. other data formats can be generated from a single format that is kept on storage. +For example, a collection of CSV files is stored, but an XLSX variant can be generated upon request automatically. +Or a single large live stream video is stored, and a collection of shorter clips is generated from a cue sheet or cut list. + + +### Re-generate annex keys + +This can be useful when storing a key is expensive, but its exact identity is known/important. For example, an outcome of a scientific computation yields a large output that is expensive to compute and to store, yet needs to be tracked for repeated further processing -- the cost of a recomputation may be reduced, by storing (smaller) intermediate results, and leaving instruction how to perform (a different) computation that yields the identical original output. + +This second scenario, where annex keys are reproduced exactly, can be considered the general case. It generally requires exact inputs to the computation, where as the first scenario can/should handle an application of a compute instruction on any compatible input data. + + +## What is in scope for git-annex? + +The execution of arbitrary code without any inherent trust is a problem. A problem that git-annex may not want to get into. Moreover, there are many candidate environments for code execution -- a complexity that git-annex may not want to get into either. + +### External remote protocol sufficient? + +From my point of view, pretty much all required functionality could be hidden behind the external remote protocol and thereby inside on or more special remote implementations. + +- `STORE`: somehow capture the computing instructions, likely linking some code to some (key-specific) parameters, like input files +- `CHECKPRESENT`: do compute instruction for a key exist? +- `RETRIEVE`: compute the key +- `REMOVE`: remove the instructions/parameter record +- `WHEREIS`: give information on computation/inputs + +where `SET/GETSTATE` may implement the instruction deposit/retrieval. + +### Worktree provisioning? + +Such external remote implementation would need a way to create suitable worktrees to (re)run a given code. Git-annex support to provide a (separate) worktree for a repository at a specific commit, with efficient (re)use of the main repository's annex would simplify such implementations. + +### Request one key, receive many + +It is possible that a single computation yields multiple annex keys, even when git-annex only asked for a single one (which is what it would do, sequentially, when driving a special remote). It would be good to be able to capture that and avoid needless duplication of computations. + +### Instruction deposition + +Using `STORE` (`git annex copy --to`) record instructions is possible (say particular ENV variables are used that pass information to a special remote), but is more or less a hack. It would be useful to have a dedicated command to accept and deposit such a record in association with one or more annex keys (which may or may not be known at that time). This likely require settling on a format for such records. + +### Storage redundancy tests + +I believe that no particular handling of annex key that are declared inputs to computing instructions for other keys are needed. Still listing it here to state that, and be possibly proven wrong. + +### Trust + +We would need a way for users to indicate that they trust a particular compute introduction or the entity that provided it. Even if git-annex does not implement tooling for that, it would be good to settle on a concept that can be interpreted/implemented by such special remotes. diff --git a/doc/todo/compute_special_remote/comment_1_9f4835cd08d9d02009b685f4a366a245._comment b/doc/todo/compute_special_remote/comment_1_9f4835cd08d9d02009b685f4a366a245._comment new file mode 100644 index 0000000000..539e311fff --- /dev/null +++ b/doc/todo/compute_special_remote/comment_1_9f4835cd08d9d02009b685f4a366a245._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="m.risse@77eac2c22d673d5f10305c0bade738ad74055f92" + nickname="m.risse" + avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de" + subject="prior art" + date="2024-04-13T20:30:56Z" + content=""" +I just want to mention that I've implemented/tried to implement something like this in . It basically just records a command line invocation to execute and all required input files as base64-encoded json in a URL with a custom scheme, which made it surprisingly simple to implement. I haven't touched it in a while and it was more of an experiment, but other than issues with dependencies on files in sub-datasets it worked pretty well. The main motivation to build it was the mentioned use-case of automatically converting between file formats. Of course it doesn't address all of your mentioned points. E.g. trust is something I haven't considered in my experiments, at all. But it shows that the current special remote protocol is sufficient for a basic implementation of this. + +I like the proposed \"request one key, receive many\" extension to the special remote protocol and I think that could be useful in other \"unusual\" special remotes as well. + +I don't quite understand the necessity for \"Worktree provisioning\". If I understand that right, I think it would just make things more complicated and unintuitive compared to always staying in HEAD. + +\"Instruction deposition\" is essentially just adding a URL to a key in my implementation, which is pretty nice. Using the built-in relaxed option automatically gives the distinction between generating keys that have never existed and regenerating keys. +"""]] diff --git a/doc/todo/compute_special_remote/comment_2_bdb9c77b3ac97cef8d1b8eeaaf300d8b._comment b/doc/todo/compute_special_remote/comment_2_bdb9c77b3ac97cef8d1b8eeaaf300d8b._comment new file mode 100644 index 0000000000..34d518f309 --- /dev/null +++ b/doc/todo/compute_special_remote/comment_2_bdb9c77b3ac97cef8d1b8eeaaf300d8b._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="mih" + avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd" + subject="Need for more than HEAD/URL?" + date="2024-04-15T05:00:58Z" + content=""" +> \"Instruction deposition\" is essentially just adding a URL to a key in my implementation, which is pretty nice. Using the built-in relaxed option automatically gives the distinction between generating keys that have never existed and regenerating keys. + +Thanks for the pointer, very useful! + +Regarding the points you raised: + +Datalad's `run` feature has been around for some years, and we have seen usage in the wild with command lines that are small programs and dozens, sometimes hundreds of inputs. It is true that anything could be simply URL-encoded. However, especially with command-patterns (always same, except parameter change) that may be needlessly heavy. Maybe it would compress well (likely), but it still poses a maintenance issue. Say the compute instructions need an update (software API change): Updating one shared instruction set is a simpler task than sifting through annex-keys and rewriting URLs. + +> I don't quite understand the necessity for \"Worktree provisioning\". If I understand that right, I think it would just make things more complicated and unintuitive compared to always staying in HEAD. + +We need a worktree different from `HEAD` whenever HEAD has changed from the original worktree used for setting up a compute instruction. Say a command needs two input files, but one has been moved to a different directory in current `HEAD`. An implementation would now either say \"no longer available\" and force maintenance update, or be able to provision the respective worktree. In case of no provision capability we would need to replace the URL-encoded instructions (this would make the key uncomputable in earlier versions), or amend with an additional instruction set (and now we would start to accumulate cruft where changes in the git-annex branch need to account for (unrelated) changes in any other branch). +"""]] diff --git a/doc/todo/way_to_instruct_on_how_to_decide_on_extension__63__.mdwn b/doc/todo/way_to_instruct_on_how_to_decide_on_extension__63__.mdwn new file mode 100644 index 0000000000..83e957c8a8 --- /dev/null +++ b/doc/todo/way_to_instruct_on_how_to_decide_on_extension__63__.mdwn @@ -0,0 +1,14 @@ +In our case we are storing videos using timestamp in the filename, e.g. + +``` +2024.03.08.09.31.09.041_2024.03.08.09.34.53.759.mkv +``` + +where last number is `ms`. `git-annex` for MD5E decides that extension is `.759.mkv`, so if we rename file (adjust timing), it seems to produce a new key. + +I wonder if you have any ideas Joey on how to overcome it (smarter extension deduction? some config to "hardcode" target extension to be .mkv?)? + +Just throwing against the wall to see if sticks + +[[!meta author=yoh]] +[[!tag projects/repronim]] diff --git a/doc/users/nobodyinperson.mdwn b/doc/users/nobodyinperson.mdwn index 56a3d1ee05..b6eff5393c 100644 --- a/doc/users/nobodyinperson.mdwn +++ b/doc/users/nobodyinperson.mdwn @@ -8,6 +8,6 @@ I use git-annex to: I made a [Thunar plugin](https://gitlab.com/nobodyinperson/thunar-plugins) for git-annex, here's a [📹 screencast](https://fosstodon.org/@nobodyinperson/109836827575976439). -In an attempt to [#gitAnnexAllTheThings](https://fosstodon.org/tags/gitAnnexAllTheThings), I used git annex as a backend for a cli time tracker [annextimelog](https://pypi.org/project/annextimelog/). It has similarities with timewarrior but adresses many of its inconveniences. +In an attempt to [#gitAnnexAllTheThings](https://fosstodon.org/tags/gitAnnexAllTheThings), I used git annex as a backend for a cli time tracker [annextimelog](https://pypi.org/project/annextimelog/). It has similarities with timewarrior but adresses many of its inconveniences. I talked about it a bit at my [distribits 2024](https://distribits.live) talk, of which there is a recording [📹 on YouTube](https://www.youtube.com/watch?v=IdRUsn-zB2s). At the [Tübix 2023](https://www.tuebix.org/) I gave a git annex workshop, of which you can find a recording of the initial talk [📹 here (🇩🇪 German)](https://tube.tchncs.de/w/db1ec5ca-94ad-4f49-a507-2124fd699ff1) or [📹 here (English)](https://tube.tchncs.de/w/1U4vbTAhSEje3KQ1dGqvxh) in the fediverse and [📹 here on Odysee (🇩🇪 German)](https://odysee.com/@nobodyinperson:6/T%C3%BCbix2023-Yann-B%C3%BCchau-git-annex:6).