tagged the past 2 years of open todos and followed up to a few of them

also moved some that were really bug reports to bugs/ and closed a
couple
This commit is contained in:
Joey Hess 2020-01-30 15:22:05 -04:00
parent c08d5612ee
commit cffa2446e8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
107 changed files with 406 additions and 4 deletions

View file

@ -1 +1,3 @@
since there is no generic 'fuse' mode, I would like to request to have `--get` (or `--auto-get`) option for diffdriver. I am trying to compare files across two branches on a repo I just cloned. I cannot download all the files and downloading differing keys across branches for the same file is a bit painful. So I felt that it would be super nice if git annex could auto get those files from somewhere (well -- original clone)
[[!tag confirmed]]

View file

@ -15,3 +15,6 @@ Apologies for the brevity, I've already typed this out once..
git annex import --mode=Ns $src # (just creates symlinks for new)
git annex import --mode=Nsd $src # (invalid mode due to data loss)
git annex import --mode=Nid $src # (invalid or require --force)
> Current thinking is in [[remove_legacy_import_directory_interface]].
> This old todo is redundant, so [[wontfix|done]] --[[Joey]]

View file

@ -19,3 +19,5 @@ There are other situations this is useful (and I use), for example, when I conve
git annex metadata --parentchild original.svg compressed.png
and this would set 'parent' and 'child' metadata respectively.
[[!tag needsthought]]

View file

@ -13,4 +13,5 @@ You may ask why it is useful? I have several usecases:
Does git-annex provide such functionnality? If not, do you think it could be implementable?
Thanks!
[[!tag unlikely]]

View file

@ -28,3 +28,5 @@ This problem comes up surprisingly often due to:
5. Some repos being too large for a machine (e.g., repacking fails due to low memory), but which can still act like a dumb file-store.
The problem gets worse when you have a lot of remotes or a lot of repos to manage (I have both). My impression is that this feature would require a syntax addition for git-annex-sync only. I like '!' because it behaves the same in GNU find and sh.
[[!tag needsthought]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2020-01-30T19:13:25Z"
content="""
git-annex sync does support remote groups, so that might also help with
this use case without needing additional syntax?
"""]]

View file

@ -1,3 +1,5 @@
Would it be hard to support MD5E keys that omit the -sSIZE part, the way this is allowed for URL keys? I have a use case where I have the MD5 hashes and filenames of files stored in the cloud, but not their sizes, and want to construct keys for these files to use with setpresentkey and registerurl. I could construct URL keys, but then I lose the error-checking and have to set annex.security.allow-unverified-downloads . Or maybe, extend URL keys to permit an -hMD5 hash to be part of the key?
Another (and more generally useful) solution would be [[todo/alternate_keys_for_same_content/]]. Then can start with a URL-based key but then attach an MD5 to it as metadata, and have the key treated as a checksum-containing key, without needing to migrate the contents to a new key.
[[!tag moreinfo]]

View file

@ -1,3 +1,5 @@
S3 lets you [redirect](https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html) requests for an object to another object, or to a URL. This could be used to export a git branch, in the manner of [[`git-annex-export`|git-annex-export]], but with annexed objects redirecting to a key-value S3 remote in the same bucket.
Related: [[todo/simpler__44___trusted_export_remotes]] ; [[forum/Using_hashdirlower_layout_for_S3_special_remote]].
[[!tag needsthought unlikely]]

View file

@ -63,3 +63,5 @@ Thankfully, we already have a technology that can fill in elegantly here: parity
This would also enhance the data-checking capabilities of git-annex, as data loss could be fixed and new parity files generated from the recovered files transparently, self-healing the archive.
[[!tag unlikely]]

View file

@ -4,3 +4,5 @@ I have a bunch of files I want to track with `git-annex` that are sitting in an
git-annex import --to=s3-remote /mnt/usb-drive/myfiles
The proposed `--to=remote` option would add the files to my repo as `import` normally does, but it wouldn't every keep the content in the repo, the only copy would now sit in `s3-remote`. As little disk space as possible would be staged temporarily in `~/my-laptop-repo`. Perhaps the easiest option would be to import a file normally, but them immediately do a `move` to `s3-remote`? But, ideally for larger files, we would want to stream them directly from `/mnt/usb-drive/myfiles` to `s3-remote` without ever staging them at `~/my-laptop-repo`.
[[!tag unlikely needsthought]]

View file

@ -12,3 +12,5 @@ I often transfer files via mediums that have transfer limits, but I am eventuall
Currently, I've been using tricks to select a subset of the files, such as a range of file-sizes.
[[!tag needsthought]]

View file

@ -21,3 +21,5 @@ repeatedly (though ssh connection caching helps some with that).
> exposes this, when available. Some sftp servers can be locked down
> so that the user can't run git-annex on them, so that could be the only
> way to get diskreserve working for such a remote. --[[Joey]]
[[!tag confirmed]]

View file

@ -1 +1,3 @@
To [[git-annex-test]] and [[git-annex-testremote]], add option to run tests under concurrency (-J). Many possible bugs are unique to the concurrent case, and it's the case I often use. While any bugs detected may be hard to reproduce, it's important to know _whether_ there are concurrency-related bugs. Much of the trust in git-annex comes from its extensive test suite, but it's somewhat concerning to trust it with important data when the concurrency case is not tested at all.
[[!tag unlikely]]

View file

@ -1 +1,3 @@
From https://cyan4973.github.io/xxHash/ , xxHash seems much faster than md5 with comparable quality. There's a Haskell implementation.
[[!tag moreinfo]]

View file

@ -9,3 +9,4 @@ Also, sometimes one can determine the MD5 from the URL without downloading the f
or because an MD5 was computed by a workflow manager that produced the file (Cromwell does this). The special remote's "CHECKURL" implementation could record an MD5E key in the
alt_keys metadata field of the URL key. Then 'addurl --fast' could check alt_keys, and store in git an MD5E key rather than a URL key, if available.
[[!tag unlikely]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-30T18:36:17Z"
content="""
This would mean that, every time something about a key is looked up in the
git-annex branch, it would also need to look at the metadata to see if this
`alt_keys` field is set.
So it doubles the time of every single query of the git-annex branch.
I don't think that's a good idea, querying the git-annex branch is already
often a bottleneck to commands.
"""]]

View file

@ -13,3 +13,5 @@ need a git hook run before checkout to rescue such files.
Also some parts of git-annex's code, including `withObjectLoc`, assume
that the .annex/objects is present, and so it would need to be changed
to look at the work tree file. --[[Joey]]
[[!tag needsthought]]

View file

@ -11,3 +11,6 @@ autobuilder? --[[Joey]]
Currently running release builds for arm64 on my phone, but it's not
practical to run an autobuilder there. --[[Joey]]
>> [[done]]; the current qemu based autobuilder is not ideal, often gets
>> stuck, but there's no point leaving this todo open. --[[Joey]]

View file

@ -3,3 +3,5 @@ I think it would be useful if the assistant (when monitoring a repo) could autom
If I then add each repo as a remote of the other (from the command-line), assistant will still not sync files between the repos until I stop all the assistants running and then restart them. Presumably only on launch does the assistant check the list of remotes?
I think this is perhaps causing issues for users not just on the command-line but also for users who create multiple local remotes from the webapp and then combine them, since the webapp is perhaps not restarting the assistant daemons after the combine operation? I'm not sure about this…
[[!tag confirmed]]

View file

@ -9,3 +9,5 @@ This would invole:
* The assistant ought to update the adjusted branch at some point after
downloads, but it's not clear when. Perhaps this will need to be deferred
until it can be done more cheaply, so it can do it after every file.
[[!tag confirmed]]

View file

@ -1,3 +1,5 @@
Can an option be added to unlock a file in such a way that the next time it gets committed, it is automatically re-locked? Or to just have this done for all unlocked files?
It's a common use case to just do one edit / re-generation of a locked file. If you forget to lock it (or a script that was supposed to lock it after modification fails in the middle), you end up with a permanently unlocked file, which can cause [[performance issues|bugs/git_status_extremely_slow_with_v7]] downstream, and also [[look odd when missing|todo/symlinks_for_not-present_unlocked_files]], lead to multiple copies when present (or risk [[annex.thin issues|bugs/annex.thin_can_cause_corrupt___40__not_just_missing__41___data]]), and leave the file open to inadvertent/unintended modification. Also, locking the file manually litters the git log with commits that don't really change repo contents.
[[!tag needsthought]]

View file

@ -1 +1,3 @@
Current special remote protocol works on one file at a time. With some remotes, a batch operation can be more efficient, e.g. querying the status of many URLs in one API call. It would be good if special remotes could optionally implement batch versions of their operations, and these versions were used by batch-mode git-annex commands. Or maybe, keep the current set of commands but let the remote read multiple requests and then send multiple replies?
[[!tag moreinfo]]

View file

@ -9,3 +9,5 @@ object in it.
This should be fixable by eg, catching all exceptions when running Annex
operations on a remote, adding its path to the message and rethrowing.
--[[Joey]]
[[!tag confirmed]]

View file

@ -32,3 +32,5 @@ Two open questions:
objects over time. So leave the update up to the user to run the command
when they want it? But then the user may get confused, why did it
download files and they didn't appear?
[[!tag needsthought]]

View file

@ -13,3 +13,5 @@ backups, and git-annex would then be aware of what was backed up in borg,
and could do things like count that as a copy.
--[[Joey]]
[[!tag needsthought]]

View file

@ -3,3 +3,5 @@
Changing the default would also let one [[repeatedly re-import a directory while keeping original files in place|bugs/impossible__40____63____41___to_continuously_re-import_a_directory_while_keeping_original_files_in_place]].
I realize this would be a breaking change for some workflows; warning of it [[like git does|todo/warn_of_breaking_changes_same_way_git_does]] would mitigate the breakage.
[[!tag unlikely]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-01-30T17:09:00Z"
content="""
See [[todo/remove_legacy_import_directory_interface]].
"""]]

View file

@ -33,3 +33,6 @@ be useful to speed up checks on larger files. The license is a
I know it might sound like a conflict of interest, but I *swear* I am
not bringing this up only as a oblique feline reference. ;) -- [[anarcat]]
> Let's concentrate on xxhash or other new hashes that are getting general
> adoption, not niche hashes like meow. [[done]] --[[Joey]]

View file

@ -32,3 +32,5 @@ surprise users... I suggest using a logic similar to
[[git-annex-import]] for consistency reasons.
Thanks! -- [[anarcat]]
[[!tag unlikely]]

View file

@ -1 +1,3 @@
If an external special remote is implemented as a Docker container, it can be safely autoenabled and run in a sandboxed way. So the distributor of a repo that has annex files fetchable with a given special remote, could have the docker tag for the special remote configured on the git-annex branch, and users could then clone and use the repo without needing to install anything.
[[!tag needsthought]]

View file

@ -1 +1,3 @@
It would help to document, in one place, the external programs and libraries on which git-annex depends for various functionalities, including optional ones. Ones I know: curl, gpg, bup. But there are also references in places to lsof, rsync, nocache. For reliable packaging, it would be useful to have an authoritative list of dependencies and which functionality each supports.
[[!tag unlikely]]

View file

@ -1,3 +1,5 @@
If a spec of the [[sqlite database schemas|todo/sqlite_database_improvements]] could be added to the [[internals]] docs, this would open some possibilities for third-party tools based on this info. E.g. one could write some sqlite3 queries to get aggregate info on the number (and total size?) of keys present in specific combinations of repos. It would of course be understood that this is internal info subject to frequent change.
Also, if [[Sometimes the databases are used for data that has not yet been committed to git|devblog/day_607__v8_is_done]], this would improve [[future_proofing]].
[[!tag needsthought unlikely]]

View file

@ -1 +1,3 @@
Is it possible to add an option, for initremote/enableremote, to encrypt the credentials but not the contents? Then it would be possible to have an exporttree remote while using embedcreds. It would also be good if locally stored credentials could be stored in encrypted form, and decrypted for use as needed. I'm uneasy about keeping credentials accessible without a passphrase.
[[!tag confirmed]]

View file

@ -7,3 +7,4 @@ store files under paths like s3://mybucket/randomstring/myfile ; the URL is "pub
If the URLs could be stored encrypted in the git-annex branch, one could track such files using the ordinary web remote. One could use an S3 export-tree
remote to share a directory with specific recipient(s), without them needing either AWS credentials or git-annex.
[[!tag unlikely moreinfo]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-30T18:25:58Z"
content="""
Is this about SETURLPRESENT in an external special remote, or is addurl
also supposed to enctypt an url? And how would addurl know if the user
wants to encrypt it, and using what gpg keys?
If your git-annex repo contains information about files you want to remain
private, why not just keep that repo private?
"""]]

View file

@ -8,3 +8,5 @@ Perhaps: Find pairs of renames that swap content between two files.
Run each pair in turn. Then run the current rename code. Although this
still probably misses cases, where eg, content cycles amoung 3 files, and
the same content amoung 3 other files. Is there a general algorythm?
[[!tag needsthought]]

View file

@ -3,3 +3,5 @@ It would be good if one could define custom external [[backends]], the way one c
@joey pointed out a potential problem: "needing to deal with the backend being missing or failing to work could have wide repurcussions in the code base." I wonder if there are ways around that. Suppose you specified a default backend to use in case a custom one was unavailable? Then you could always compute a key from a file, even if it's not in the right backend. And once a key is stored in git-annex, most of git-annex treats the key as just a string. If the custom backend supports checksum verification, without the backend's implementation, keys from that backend would be treated like WORM/URL keys that do not support checksum checking.
Thoughts?
[[!tag needsthought]]

View file

@ -48,3 +48,5 @@ subsequent WHEREIS, which may complicate its code slightly.
Note that the protocol does allow querying with GETCONFIG etc before
responding to a WHEREIS request.
[[!tag confirmed]]

View file

@ -3,3 +3,5 @@ It would be useful to have a [[`git-annex-cat`|forum/Is_there_a___34__git_annex_
If file is not present, or `remote.here.cost` is higher than `remote.someremote.cost` where file is present, `someremote` would get a `TRANSFER` request where the `FILE` argument is a named pipe, and a `cat` of that named pipe would be started.
If file is not annexed, for uniformity `git-annex-cat file` would just call `cat file`.
[[!tag needsthought]]

View file

@ -5,3 +5,5 @@ Now I followed the documentation about the special remote adb and created that r
Which is caused by the fact that I didn't have checked out the files on my workstation. I don't need the files on this pc so it would be stupid to checkout partially huge files there or in other words I don't need the files at that place, I don't get why the export command not has a --from option where it can get the files?
Is there a reason that does not exist and if so what would be a way to do sending files to the android device without ssh-ing into my server?
[[!tag unlikely]]

View file

@ -1 +1,3 @@
Can git-annex-get be extended so that "git-annex-get --batch --key" fetches the keys (rather than filenames) given in the input?
[[!tag needsthought]]

View file

@ -1,3 +1,5 @@
Currently, git-annex-migrate leads to content (and metadata) being stored under both old and new keys. git-annex-unused can drop the contents under the old key, but then you can't access the content if you check out an older commit. Maybe, an option can be added to migrate keys using [git-replace](https://git-scm.com/docs/git-replace) ? You'd git-replace the blob .git/annex/objects/old_key with the blob .git/annex/objects/new_key, the blob ../.git/annex/objects/old_key with the blob ../.git/annex/objects/new_key , etc. You could then also have a setting to auto-migrate non-checksum keys to checksum keys whenever the contents gets downloaded.
More generally, git-annex-replace could be implemented this way, doing what git-replace does, but for git-annex keys rather than git hashes. [[git-annex-pre-commit]] might need to be changed to implement replacement of keys added later.
[[!tag needsthought]]

View file

@ -1 +1,3 @@
When using [[linked worktrees|tips/Using_git-worktree_with_annex]], the main tree is currently handled differently from the linked trees: "if there is change in the tree then syncing doesn't update git worktrees and their indices, but updates the checked out branches. This is different to the handling of the main working directory as it's either got updated or left behind with its branch if there is a conflict." Is there a reason for this? Could linked worktrees be treated same as main one?
[[!tag moreinfo]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-30T17:12:40Z"
content="""
That tip was written by leni536, and I don't really understand what it's
talking about with a difference in sync behavior. I'm not sure it's
accurate or describes what happens clearly.
To me it seems really simple, no matter if you have a regular work tree, or
are using git-worktree or whatever: sync fetches, merges, and pushes. Merging
updates the current work tree, and AFAIK not whatever other work trees might
be using the same .git repository. In any case, sync should behave the same
as git pull as far as updating work trees goes.
Can you please show an example of whatever problem you may have with the
current behavior?
"""]]

View file

@ -1 +1,3 @@
git-annex-test failures sometimes reflect failures not of git-annex but of externals utils on which it relies. E.g. when my installation or configuration of gpg has problems, git-annex test suite fails due to the tests that rely on gpg. (And there doesn't seem to be a simple way to skip tests that match a regexp.) git-annex could avoid that by running some simple sanity checks (beyond just existence) on gpg or other optional dependencies, and skipping tests if these checks fail. E.g. if simple test commands to encrypt/sign a small file with gpg fail, then skip gpg-based tests (and warn the user).
[[!tag unlikely]]

View file

@ -26,3 +26,5 @@ I would be willing to contribute some patches and although I have a respectable
A a sidenote, I don't know how a repo containing about 300k files jumped to 1400k git objects within the last 2 months.
Any feedback welcome, thanks.
[[!tag needsthought unlikely]]

View file

@ -6,3 +6,5 @@ A few possibilities:
- Create branches or tags in an annex that collect a set of version-compatible checkouts for related projects. The commit/tag messages provide a natural place for meta-commentary
- Save and version files that aren't quite junk but don't belong *in* a repo (logs, dumps, backups, editor project/workspace files, notes/to-do lists, build-artifacts, test-coverage/linter stat databases, shell history) alongside the repo, making it easier to have a consistent environment for working on one project across multiple systems.
- Make separate system-specific "master" branches for the main projects directory on each system, then edit and push changes from any other. For example, prep the projects directory on an infrequently-used laptop from your desktop and push/pull the changes.
[[!tag unlikely moreinfo]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-01-30T18:20:42Z"
content="""
This seems, at first glance, entirely out of scope for git-annex.
There are other things that manage lots of git repositories. I've written one
even (myrepos).
"""]]

View file

@ -109,3 +109,5 @@ The best fix would be to improve git's smudge/clean interface:
* Allow clean filter to read work tree files itself, to avoid overhead of
sending huge files through a pipe.
[[!tag confirmed]]

View file

@ -9,3 +9,6 @@ use restagePointerFile, but that did not help; git update-index does then
smudge it during the `git annex unlock`, which is no faster (but at least
doing it then would avoid the surprise of a slow `git status` or `git
commit -a`). Afterwards, `git status` then smudged it again, unsure why!
--[[Joey]]
[[!tag confirmed]]

View file

@ -268,3 +268,5 @@ decreases as it goes?
---
See also, [[adb_special_remote]]
[[!tag confirmed]]

View file

@ -36,3 +36,5 @@ importtree, but there are several roadblocks:
So, it seems that, importtree would need to be able to run commands
other than rsync on the server. --[[Joey]]
[[!tag needsthought]]

View file

@ -1 +1,3 @@
The documentation for the new import remote command says, "Importing from a special remote first downloads all new content from it". For many special remotes -- such as Google Cloud Storage or DNAnexus -- checksums and sizes of files can be determined without downloading the files. For other special remotes, data files might have associated checksum files (e.g. md5) stored next to them in the remote. In such cases, it would help to be able to import the files without downloading (which can be costly, especially from cloud provider egress charges), similar to addurl --fast .
[[!tag needsthought]]

View file

@ -12,3 +12,5 @@ An attempt at making it stream via unsafeInterleaveIO failed miserably
and that is not the right approach. This would be a good place to use
ResourceT, but it might need some changes to the Annex monad to allow
combining the two. --[[Joey]]
[[!tag confirmed]]

View file

@ -1 +1,3 @@
Currently, the git-annex branch is not checked out, but is accessed as needed with commands like git-cat. Could git-annex work faster if it kept the git-annex branch checked out? Especially if one could designate a fast location (like a ramdisk) for keeping the checked-out copy. Maybe git-worktree could be used to tie the separate checkout to the repository.
[[!tag unlikely]]

View file

@ -1 +1,3 @@
Would it be hard to add a variantion to checksumming [[backends]], that would change how the checksum is computed: instead of computing it on the whole file, it would first be computed on file chunks of given size, and then the final checksum computed on the concatenation of the chunk checksums? You'd add a new [[key field|internals/key_format]], say cNNNNN, specifying the chunking size (the last chunk might be shorter). Then (1) for large files, checksum computation could be parallelized (there could be a config option specifying the default chunk size for newly added files); (2) I often have large files on a remote, for which I have md5 for each chunk, but not for the full file; this would enable me to register the location of these fies with git-annex without downloading them, while still using a checksum-based key.
[[!tag needsthought]]

View file

@ -26,3 +26,5 @@ It could undo the de-prioritization when it sees that the network has
changed.
--[[Joey]]
[[!tag needsthought]]

View file

@ -1,3 +1,5 @@
In the [[design/external_special_remote_protocol]], the `File` parameter of various requests is specified to be a regular file. If it could be a named pipe, this would open up useful possibilities: [[todo/git-annex-cat]], [[todo/transitive_transfers]], [[todo/git-annex-export_--from_option]], [[todo/OPT__58_____34__bundle__34___get_+_check___40__of_checksum__41___in_a_single_operation/]], [[todo/to_and_from_multiple_remotes]], faster [[`git-annex-fsck --from`|git-annex-fsck]], passing named pipes on `git-annex` command line (for streaming the outputs of a running command directly to a remote, or using `git-annex` as a building block of larger workflows), and maybe others.
An optional protocol request `NAMEDPIPESSUPPORTED`, similar to [[`EXPORTSUPPORTED`|design/external_special_remote_protocol/export_and_import_appendix#index1h2]], could tell `git-annex` that the remote supports named pipes. For remotes that don't declare such support, it could be emulated: before sending e.g. `TRANSFER STORE Key File`, if `File` is a pipe and the remote hasn't said it supports pipes, `git-annex` would drain the pipe to a `TempFile` and then send `TRANSFER STORE Key TempFile` instead. Then the rest of `git-annex` can presume pipes support.
[[!tag needsthought]]

View file

@ -14,3 +14,6 @@ drives, but not from the network. --[[Joey]]
> repository, in another repository it could be a fairly low cost. The user
> would need to examine all the costs to pick the cost they want; using
> remote names seems better UI. --[[Joey]]
> > that seems convincing reason not to implement this and instead
> > implement remote groups. [[wontfix|done]] --[[Joey]]

View file

@ -48,3 +48,5 @@ fed the names of files to operate on via stdin.
> These hooks may be too specific to this purpose, while a more generalized
> hook could also support things like [[storing_xattrs|support_for_storing_xattrs]]
> --[[Joey]]
[[!tag needsthought]]

View file

@ -3,3 +3,5 @@ I want to add some dotfiles in the root of my repository to git-annex as unlocke
I was thinking that it might make sense to have a `git annex config` option to tell git-annex not to add the `.* !filter` line to `.git/info/attributes` when initialising other clones of this repo. In the meantime, I've worked around it using a `post_checkout` hook in my `~/.mrconfig` which edits `.git/info/attributes`.
--spwhitton
[[!tag needsthought]]

View file

@ -14,3 +14,6 @@ to my surprise all i got was the retrial of the existing meta-data instead of th
IHO git annex should allow to store metadata in batch mode by key
[[!meta title="metadata --batch parses json strictly, loosen?"]]
> [[done]] I guess, as there's been no response to my question in over a
> year. --[[Joey]]

View file

@ -14,3 +14,5 @@ then B exported a tree containing `[foo, bar]`, and then A exported
So, if one exported tree is a subset of the other, it's not necessary to
unexport files added by the other tree. It's sufficient to check that files
are present in the export and upload any that are missing. --[[Joey]]
[[!tag confirmed]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-01-30T17:45:39Z"
content="""
I don't think that git-annex can generally abort an operation that is
outright hung. While it's certianly possible to kill a worker thread, if
that thread has other threads associated with it, they could keep on using
resources. And if an external command is hung, the command would keep
running. The only way to guarantee such an abort is to kill the whole
git-annex process and let the signal reap its children. That's what the
assistant does when the UI is used to stop a transfer, it kills the whole
`git-annex transferkeys` process.
(A locked git index file does not prevent git-annex from making transfers
so AFAICS the comment above is not relevant.)
"""]]

View file

@ -1,3 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-01-30T17:52:31Z"
content="""
Moving a similar todo I wrote to here:
I'd like an option that makes transfers (get,copy,etc) of files fail if the
transfer speed falls below a given rate.
@ -8,6 +15,5 @@ at the usual speed and skipping the ones that are coming too slow. Then
I can see what files it failed on and either resume those or see if I have
a copy of them somewhere else.
I imagine there could be other use cases...
--[[Joey]]
(Unfortunatly implementing that has the same problems..)
"""]]

View file

@ -26,3 +26,5 @@ In any case, the new test suite would need to be run somewhere;
running it on at least some of the autobuilders might be a good way.
--[[Joey]]
[[!tag confirmed]]

View file

@ -1 +1,4 @@
Sometimes you want to operate on files touched by commits in a range, e.g. to `git-annex-copy` files added in the last 10 commits to an S3 special remote. Could the option be added, to commands that take a path to operate on, to give a commit range, with the meaning "operate on files changed by these commits"?
> Since my comment gives a way to do it, and there was no followup, I think
> this is [[done]] --[[Joey]]

View file

@ -1,3 +1,5 @@
Profiling of `git annex find --not --in web` suggests that converting Ref
to contain a ByteString, rather than a String, would eliminate a
fromRawFilePath that uses about 1% of runtime.
[[!tag confirmed]]

View file

@ -19,3 +19,5 @@ writer and it would have already behaved as it would after the change.
But: When a process writes to the journal, it will need to update its state
to remember it's no longer empty. --[[Joey]]
[[!tag confirmed]]

View file

@ -4,3 +4,5 @@ there are use-cases in which it would come in handy to have an option for a spec
For example, I use git annex for very large scientific tomographic datasets and files originating from their processing like segmentations, distance maps, skeletons. While compressing the raw data makes little sense, compression e.g. segmentations and skeletons has a huge impact on the effective files size. Since compressing files of a few GBs to TBs is time consuming, I prefer to have an uncompressed version in the working tree (so I do not use file formats that are using compression by default e.g. .nii.gz) but it would be very helpful to have the option to push precious or older versions to a remote that then uses compression. Using encryption for this is a bit of an overkill and takes considerably longer than compressing with e.g. `pbzip`. A compressed file system for this purpose is no option, because the special remote is supposed to live on a restrictive archive server.
Though, I guess, it would be possible to write a special remote wrapper for this, I wonder if this might qualify as an officially supported option to the already existing special remotes like "directory" or "rsync". E.g. in conjunction to `encryption` something like `compression` with possible values like `pbzip`, `bzip`, `pigz` and `gzip`.
[[!tag confirmed]]

View file

@ -11,3 +11,5 @@ Cleaner would be to add a field to the key, as in MD5E-s0-uUSERKEYSTRING--d41d8c
This enables attaching metadata not to file contents, but to the file itself; or partitioning keys (and therefore key metadata) into namespaces. The downside is some loss of
deduplication. This loss may be acceptable. The loss can be mitigated for local repo and non-special remotes: after storing an object with e.g. MD5 d41d8cd98f00b204e9800998ecf8427e under .git/annex/objects, check if there is a symlink .git/annex/contenthash/d41d8cd98f00b204e9800998ecf8427e ; if not, make this a symlink to the object just stored; if yes,
erase the object just stored, and hardlink the symlink's target instead.
[[!tag unlikely moreinfo]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2020-01-30T18:58:13Z"
content="""
Is there any reason to leave this todo open since [[external_backends]]
would presumably let it be implemented?
"""]]

View file

@ -1,3 +1,5 @@
One way I've lost data is to git-annex-add it in an untrusted temp clone of a repo, then commit and push the git branch, but forget to git-annex-copy the annexed contents referenced by that branch to a GloballyAvailable, (semi-)trusted remote. Then, when the temp clone is gone, the branch pushed to the repo is referencing permanently dead files. Maybe, git-annex-init could install a pre-push hook to check for this, and abort the push if it happens? Basically, to ensure that whatever data is referenced by pushed branches will actually be at least potentially get-table.
Even if the current repo is not temp/untrusted, when sharing data with someone, you may want to ensure that any annexed files referenced by a pushed branch are actually potentially available.
[[!tag moreinfo]]

View file

@ -1,3 +1,5 @@
Add an option to give git-annex a path to a RAM disk, and an option to set the maximum space to be used there. git-annex often knows the size of the files it is downloading, since it's part of the key, so can determine in advance if a tempfile of that size would fit on the RAM disk. One could instead symlink `.git/annex/tmp/` to a RAM disk, but this could cause memory overflow if a large file is transferred.
Related: [[todo/keep_git-annex_branch_checked_out__63__]], [[todo/transitive_transfers]]
[[!tag unlikely]]

View file

@ -20,3 +20,5 @@ At some point in the future, once all git-annex and git-annex-shell
can be assumed to be upgraded to 6.20180312, this fallback can be removed.
It will allows removing a lot of code from git-annex-shell and a lot of
fallback code from Remote.Git.
[[!tag confirmed]]

View file

@ -117,3 +117,5 @@ to a change to the master branch.
But room needs to be left to add this kind of thing. Ie, what git-annex
adds to the git patch needs to have its own expansion point.
[[!tag needsthought]]

View file

@ -1 +1,3 @@
Currently, if I do some work on an experimental branch, creating some annexed files, then abandon the branch, information about keys created on the experimental branch will remain in the git-annex branch. This breaks git's normal notion of lightweight branching, where you can work on an experimental branch and, if you later decide to abandon that work, it'll be as if the experimental branch never existed. Maybe, it would make sense to have, for each branch mybranch, a corresponding branch git-annex-b/mybranch , which would hold the state of the git-annex branch reflecting work on mybranch? Then, if you decide to merge mybranch into master, git-annex-b/mybranch would get union-merged into the git-annex branch (or into git-annex-b/master). But if you decide to abandon/delete mybranch, git-annex-b/mybranch can be abandoned/deleted with no trace left in the main git-annex branch.
> [[wontfix|done]] --[[Joey]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-30T18:10:49Z"
content="""
That won't work, and here's why:
You're in master, and you git checkout -b tmp
Now you're in tmp, and you git-annex move foo --from origin
Now you git checkout master. You delete tmp and tmp/git-annex.
Except, foo has been moved from origin to the local repo. So now the local
repo doesn't know it contains foo, at least until git-annex fsck notices
it's there. Worse, no other repo knows where foo went, only that it was
deleted from origin.
Notice also that, even if you keep tmp around, tmp/git-annex must never get
pushed, unless tmp get merged back into master. So even without deleting
tmp, you get into this situation where other clones don't know where the
file went.
---
git-annex v0 behaved just like this, and it quickly became apparent that it
was not a good idea due to this kind of scenario.
"""]]

View file

@ -1 +1,3 @@
Right now, when computing a WORM key from a relative path or a URL key from a URL, if the original string is longer than a SHA256 checksum, its tail is replaced with its md5. Unfortunately, this eats up the file extension(s) at the end, causing the issues that \*E backends solve. It would be better to keep the tail of the path and replace the start or the middle with the md5, preserving extensions (as configured in annex.maxextensionlength) the same way \*E backends do. Maybe also, add a config option for the length beyond which the replacement-with-checksum happens?
[[!tag confirmed]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-30T18:43:19Z"
content="""
I think it would be safe to make this change for WORM keys, which
certianly don't need to generate the same key for 2 files with the same
name.
Less sure about URL keys, if two git-annex addurls versions pick different
keys for the same url, then there would be a merge conflict, where
currently there is not. I think I've addurled the same url in different
clones of a repo before, probably. Although addurl with and without --fast
or --relaxed also causes that problem and maybe it's not worth worrying
about it.
"""]]

View file

@ -1 +1,3 @@
If a git-annex repo is copied (e.g. by creating an AWS volume from a snapshot), there is a possibility of different repo copies with the same UUID. It would help if there was an option to [[`git-annex-reinit`|git-annex-reinit]] that would create a new uuid for the current repo.
[[!tag moreinfo]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-01-30T17:36:09Z"
content="""
I don't want to complicate the location logs with time-dependent sameas
hacks.
Is this repo that's been copied a special remote? fsck --fast --from would
then not be very fast since it has to talk to the special remote. A
dedicated command could be faster than that.
If the repo is a git-annex repo though, I'd expect git annex fsck --fast
to be nearly optimal, the only extra work it does over such a dedicated
command, I think, is a stat of the object file to check if it's present.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2020-01-30T19:21:00Z"
content="""
See [[todo/reinit_should_work_without_arguments]] for another argument for
the same thing.
"""]]

View file

@ -63,3 +63,6 @@ repositories:
git annex sync
Thanks for any feedback or comments... -- [[anarcat]]
> [[done]], as duplicate of [[todo/reinit_current_repo_to_new_uuid]]
> --[[Joey]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-01-30T19:19:53Z"
content="""
The same thing is being also dicussed at [[todo/reinit_current_repo_to_new_uuid]]
so I'm closing this todo in favor of that one.
"""]]

View file

@ -17,3 +17,5 @@ git annex find . --in here --and --not --in kbfs | while read filename
```
but this means that every `git annex copy` command creates a new commit per file transferred, rather than a single commit at the end of the transfer. This may not seem like a big deal, but multiplying that over hundreds of files, it adds up to quite a bit of wasted disk space. (I'll also be looking into ways to squash or prune such commits, but it'd be nice to not have to do that.)
[[!tag confirmed needsthought]]

View file

@ -45,3 +45,5 @@ cases, convert to the new interface, and keep others using the old
interface.
--[[Joey]]
[[!tag needsthought]]

View file

@ -2,3 +2,6 @@ The link targets of annexed files are currently very long. This creates proble
Or, if you're tired of backend requests, maybe implement a scheme for external backends, like the one for external special remotes? For external backend EXTNNN the user would put a script git-annex-external-backend-NNN in the path; the script would support commands like calckey, examinekey . Then I could also implement e.g. canonicalizing backends that strip away variable but semantically irrelevant information before computing the checksum.
[[!meta title=avoid duplicating key twice in symlink to object file]]
[[!tag unlikely]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2020-01-30T18:49:47Z"
content="""
Since there is a separate todo item [[external_backends]], let's not
discuss that idea here.
key/f would have been a great idea to have had 10 years ago.
(Although it does mean that if the object file somehow gets moved out of
its directory, there's no indication in its name that it's a git-annex
object file)
But if that's all this todo is about, we'd need some kind of transition
plan for existing repos with history containing symlinks to key/key.
I doubt there is a good way to make that transition.
"""]]

View file

@ -1,3 +1,5 @@
Currently, some issues impede the use of export remotes: (1) they're untrusted, except for versioned ones -- and from those keys cannot be dropped; (2) using them is different than using normal remotes: one can't just copy or move keys to them, one has to first make a tree-ish. Maybe this could be fixed, as follows. To copy a key to an export remote, if the key is not yet present in it, put it under .keys/aaa/bbb/keyname on the remote. That is, take the tree-ish currently on the remote, merge .keys/aaa/bbb/keyname with it, and put that on the remote. To drop a key from an external remote, take the tree-ish currently on the remote, drop all instances of the key from it, and push the changed tree-ish to the remote. To git-annex-export add an option --add , which will add the tree-ish to the tree-ish currently on the remote, without losing any keys currently on the remote: take the tree-ish currently on the remote; overlay on it the treeish being exported; for any files that would be overwritten, if no copies of that key would be left, move it to .keys/aaa/bbb/keyname in the tree-ish that is then pushed to the remote.
This way, can always just copy any tree to the remote, without worrying about losing data.
[[!tag needsthought]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2020-01-30T18:05:29Z"
content="""
I think that the --sameas feature could be used to implement those combo remotes?
"""]]

View file

@ -2,3 +2,5 @@ Add `remote.<name>.annex-speculate-can-get` config setting for non-special remot
Then one can make a quick clone of the current repo, and instead of re-configuring all its remotes in the new clone, just configure the origin to be a `speculate-can-get` remote.
This would also be useful when you have unconnected but related repos, and want to occasionally share files between them without merging their histories.
[[!tag needsthought]]

View file

@ -122,3 +122,5 @@ remaining todo:
to iterate over the unlocked files, filter out any that are modified,
and record the InodeCaches of the unmodified ones. Seems that it would
have to use git's index to know which files are modified.
[[!tag confirmed]]

View file

@ -26,3 +26,5 @@ a new empty directory in its place and start putting files in there.
What's needed is an action that creates directories only up to a given
point, which can be either .git/annex or the top of the worktree depending
on what's being done. --[[Joey]]
[[!tag confirmed]]

View file

@ -13,3 +13,5 @@ At least `curl --xattr` saves `xdg.origin.url`.
Perhaps `git-annex-metadata` could be leveraged to automatically store and restore xattrs? Might even work that addition of xattrs would always have to be done through a git-annex command, but restoration would be done automatically if git-annex noticed there are xattrs stored in metadata, and the file system is mounted with `user_xattr`.
The `user` namespace is used for user xattrs and thus for "proposed metadata attributes" above. These attributes are valid git-annex metadata fields as-is.
[[!tag unlikely]]

View file

@ -38,3 +38,5 @@ turn out to have missing content. So for this to really be useful,
the branch needs to automatically get updated.
--[[Joey]]
[[!tag needsthought]]

View file

@ -1,3 +1,5 @@
If file A is annexed and dropped, and B is a relative symlink to A, then git annex get B should result in A being fetched, but currently doesn't.
This would especially help if B is deep within some dir 'mydir', and you do git annex get mydir: annexed files under mydir get fetched,
but not annexed files elsewhere in the repository to which symlinks under mydir point. So such symlinks under mydir will continue to remain broken.
[[!tag unlikely]]

Some files were not shown because too many files have changed in this diff Show more