remove old closed bugs and todo items to speed up wiki updates and reduce size
Remove closed bugs and todos that were last edited or commented before 2022. Except for ones tagged projects/* since projects like datalad want to keep around records of old deleted bugs longer. Command line used: for f in $(grep -l '|done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2022 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2022 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done for f in $(grep -l '\[\[done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2022 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2022 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
This commit is contained in:
parent
acdd5fbab6
commit
4d90053e17
427 changed files with 0 additions and 15690 deletions
|
@ -1,8 +0,0 @@
|
|||
# As is
|
||||
At the FAT disks annex uses ajusted unlocked branch. Files use double space: in the file tree and in the .git folder.
|
||||
|
||||
# As I wonder
|
||||
At such disks, with option annex.thin, annex uses only file tree for keeping content. Content of the files in the .git folder is wiped.
|
||||
|
||||
> [[done]], dup of other todo, and I don't know how to avoid the problem
|
||||
> with git deleting the file. --[[Joey]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-02-08T18:17:00Z"
|
||||
content="""
|
||||
This has the following problem: You run git pull. A file got deleted. git
|
||||
deletes the file in the repository directory. That was the only copy of the
|
||||
content, so it's now impossible to revert the deletion and get the file
|
||||
back, which you're supposed to be able to do.
|
||||
|
||||
This is why git-annex has to either make a copy or hard link the file
|
||||
away for safekeeping.
|
||||
|
||||
As already discussed in [[annex.thin without hardlinks]].
|
||||
"""]]
|
|
@ -1,23 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2016-09-21T17:11:06Z"
|
||||
content="""
|
||||
Standard encfs warning: It's buggy and insecure. Don't use it.
|
||||
|
||||
You can find many other problems caused by encfs on this site, and
|
||||
<https://defuse.ca/audits/encfs.htm> has described security problems with
|
||||
encfs for years.
|
||||
|
||||
It would not help for `git-annex add` to check some kind of filename limit,
|
||||
because it would not prevent you doing this:
|
||||
|
||||
git annex add smallenough
|
||||
git mv smallenough oh-oops-my-name-is-too-long-for-encfs
|
||||
git commit -m haha
|
||||
|
||||
A git pre-commit hook can of course be written that blocks such commits.
|
||||
|
||||
I am not inclined to complicate git-annex just to handle encfs given how
|
||||
broken encfs is.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="interfect@b151490178830f44348aa57b77ad58c7d18e8fe7"
|
||||
nickname="interfect"
|
||||
subject="Pre Commit Hook"
|
||||
date="2016-09-21T19:20:04Z"
|
||||
content="""
|
||||
I'm basically stuck with whatever home directory encryption Canonical deigns to give me in their setup wizard, given my time and attention budget. I've looked a bit at the security problems with it and they mostly seem to be that it's a bit leaky due to not hiding structures and sizes. Hiding contents is better than not hiding contents, so that's what I've got.
|
||||
|
||||
Anyway, a pre-commit hook, or maybe an update hook, would be a great solution. I'd like one to be on the wiki somewhere as a useful tip for actually using git annex effectively across a bunch of non-ideal environments. It would be great if a \"git annex init\" could set it up for me, too.
|
||||
|
||||
Any ideas for writing a pre-commit script that works on Linux, Mac, Windows, Android, and whatever weird embedded NAS things people might want to use it on? If I went with an update script over a pre-commit, that would make platform support less of a problem, but then you'd get Git Annex into weird situations when syncing.
|
||||
|
||||
How would Git Annex react if I made a commit on one system, but then my central syncing repo's update script rejected the commit for breaking the rules on file names? If I have a commit that isn't allowed to be pushed to a particular remote, how would I use git annex to get it out of the history of any repos it might have already gotten to?
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="interfect@b151490178830f44348aa57b77ad58c7d18e8fe7"
|
||||
nickname="interfect"
|
||||
subject="comment 3"
|
||||
date="2016-09-21T19:32:06Z"
|
||||
content="""
|
||||
Also, I think Ubuntu is \"ecryptfs\" and not \"encfs\" anyway.
|
||||
"""]]
|
|
@ -1,23 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2016-09-21T20:16:44Z"
|
||||
content="""
|
||||
If an update hook rejects a push, then `git annex sync` will just note that
|
||||
it was unable to push. It will sync in only one direction until the problem
|
||||
that prevents pushing gets resolved.
|
||||
|
||||
It might try pushing to a different branch name than usual to get around
|
||||
some other problems that cause pushes to fail so be sure to have the update
|
||||
hook check pushes to all branches (except for the git-annex branch)).
|
||||
|
||||
I don't know why you'd want to filter such a commit out of the git history.
|
||||
You could just fix it by renaming the problem file and make a commit on top
|
||||
of the problem commit. Just make the update hook only look at the diff
|
||||
between the old version of the branch and the new version, so it won't be
|
||||
tripped up by intermediate commits that violate its rules.
|
||||
|
||||
(I know that Ubuntu uses encfs or something like that by default, but
|
||||
surely they have not removed the Debian installer's support for full
|
||||
disk encryption?)
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="interfect@b151490178830f44348aa57b77ad58c7d18e8fe7"
|
||||
nickname="interfect"
|
||||
subject="comment 5"
|
||||
date="2016-09-21T22:49:55Z"
|
||||
content="""
|
||||
OK, I'll try something like that.
|
||||
|
||||
(Full disk encryption is still there; I think on one system I just have ecryptfs, because I want to be able to get in over ssh sometimes, and on one I have *both* FDE and ecryptfs on, because I enjoy performance penalties.)
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
Would it be hard to support MD5E keys that omit the -sSIZE part, the way this is allowed for URL keys? I have a use case where I have the MD5 hashes and filenames of files stored in the cloud, but not their sizes, and want to construct keys for these files to use with setpresentkey and registerurl. I could construct URL keys, but then I lose the error-checking and have to set annex.security.allow-unverified-downloads . Or maybe, extend URL keys to permit an -hMD5 hash to be part of the key?
|
||||
|
||||
Another (and more generally useful) solution would be [[todo/alternate_keys_for_same_content/]]. Then can start with a URL-based key but then attach an MD5 to it as metadata, and have the key treated as a checksum-containing key, without needing to migrate the contents to a new key.
|
||||
|
||||
> Closing, because [[external_backends]] is implemented, so you should be
|
||||
> able to roll your own backend for your use case here. Assuming you can't
|
||||
> just use regular MD5E and omit the file size field, which will work too.
|
||||
> [[done]]
|
||||
> --[[Joey]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-01-22T15:54:03Z"
|
||||
content="""
|
||||
Have you tried just constricting MD5E keys without the size value?
|
||||
git-annex still supports keys from before v2 repo version that did not
|
||||
include size, so I'd guess it would work ok.
|
||||
"""]]
|
|
@ -1,27 +0,0 @@
|
|||
What steps will reproduce the problem?
|
||||
Sync a lot of small files.
|
||||
|
||||
What is the expected output? What do you see instead?
|
||||
The expected output is hopefully a fast transfer.
|
||||
|
||||
But currently it seems like git-annex is only using one thread to transfer(per host or total?)
|
||||
|
||||
An option to select number of transfer threads to use(possibly per host) would be very nice.
|
||||
|
||||
> Opening a lot of connections to a single host is probably not desirable.
|
||||
>
|
||||
> I do want to do something to allow slow hosts to not hold up transfers to
|
||||
> other hosts, which might involve running multiple queued transfers at
|
||||
> once. The webapp already allows the user to force a given transfer to
|
||||
> happen immediately. --[[Joey]]
|
||||
|
||||
And maybe also an option to limit how long a queue the browser should show, it can become quite resource intensive with a long queue.
|
||||
|
||||
> The queue is limited to 20 items for this reason. --[[Joey]]
|
||||
|
||||
|
||||
---
|
||||
|
||||
> There has been a lot of improvement in both parallization support
|
||||
> and per-file overhead on speed since this todo was filed. This todo does
|
||||
> not look relevent enough to leave open, so [[done]] --[[Joey]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://www.google.com/accounts/o8/id?id=AItOawnKT33H9qVVGJOybP00Zq2NZmNAyB65mic"
|
||||
nickname="Lucas"
|
||||
subject="comment 1"
|
||||
date="2014-11-12T07:58:07Z"
|
||||
content="""
|
||||
Opening multiple connections to a host can be preferable sometimes and it's unlikely to be an issue at all for the larger remotes like Google, Microsoft or S3.
|
||||
|
||||
For example, the OneDrive provider spends a lot of time sitting around waiting for initialisation between uploads. Using, say 5 threads instead of 1 would allow it to continue doing things while it waits.
|
||||
|
||||
Multiple connections can also vastly improve upload speeds for users with congested home internet connections.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://launchpad.net/~krastanov-stefan"
|
||||
nickname="krastanov-stefan"
|
||||
subject="Status of this issue"
|
||||
date="2014-12-27T15:18:42Z"
|
||||
content="""
|
||||
I was unable to find a way to tell git-annex that certain remotes should receive multiple transfers in parallel. Is this implemented yet or on the roadmap? If neither would modifying the webapp to bear this logic without touching git-annex itself be a solution (asking mainly because it can be done with a greasemonkey script)?
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="lhunath@3b4ff15f4600f3276d1776a490b734fca0f5c245"
|
||||
nickname="lhunath"
|
||||
subject="Simultaneous transfers"
|
||||
date="2018-02-02T17:37:27Z"
|
||||
content="""
|
||||
I highly recommend ensuring that:
|
||||
1. Each remote can configure a number of maximum simultaneous transfers, where each type of remote comes with a sensible default number.
|
||||
2. Transfers to multiple individual remotes happen in parallel regardless of their simultaneous transfers setting.
|
||||
|
||||
Judging from the fact that simultaneous transfers happen just fine when you hit the > icon in the webapp, I would assume that most of the underbelly for this is already present.
|
||||
"""]]
|
|
@ -1,41 +0,0 @@
|
|||
```
|
||||
From 92dfde25409ae2268ab2251920ed11646c122870 Mon Sep 17 00:00:00 2001
|
||||
From: Reiko Asakura <asakurareiko@protonmail.ch>
|
||||
Date: Tue, 26 Oct 2021 15:46:38 -0400
|
||||
Subject: [PATCH] Call freezeContent after move into annex
|
||||
|
||||
This change better supports Windows ACL management using
|
||||
annex.freezecontent-command and annex.thawcontent-command and matches
|
||||
the behaviour of adding an unlocked file.
|
||||
|
||||
By calling freezeContent after the file has moved into the annex,
|
||||
the file's delete permission can be denied. If the file's delete
|
||||
permission is denied before moving into the annex, the file cannot
|
||||
be moved or deleted. If the file's delete permission is not denied after
|
||||
moving into the annex, it will likely inherit a grant for the delete
|
||||
permission which allows it to be deleted irrespective of the permissions
|
||||
of the parent directory.
|
||||
---
|
||||
Annex/Content.hs | 3 +++
|
||||
1 file changed, 3 insertions(+)
|
||||
|
||||
diff --git a/Annex/Content.hs b/Annex/Content.hs
|
||||
index da65143ab..89c36e612 100644
|
||||
--- a/Annex/Content.hs
|
||||
+++ b/Annex/Content.hs
|
||||
@@ -346,6 +346,9 @@ moveAnnex key af src = ifM (checkSecureHashes' key)
|
||||
liftIO $ moveFile
|
||||
(fromRawFilePath src)
|
||||
(fromRawFilePath dest)
|
||||
+ -- On Windows the delete permission must be denied only
|
||||
+ -- after the content has been moved in the annex.
|
||||
+ freezeContent dest
|
||||
g <- Annex.gitRepo
|
||||
fs <- map (`fromTopFilePath` g)
|
||||
<$> Database.Keys.getAssociatedFiles key
|
||||
--
|
||||
2.30.2
|
||||
|
||||
```
|
||||
|
||||
> [[applied|done]] --[[Joey]]
|
|
@ -1,49 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-10-26T17:49:00Z"
|
||||
content="""
|
||||
Thank you for putting this patch together. It is especially helpful to get
|
||||
patches from a windows user, since it's far from my comfort zone.
|
||||
|
||||
---
|
||||
|
||||
My first concern was what happens if git-annex is interrupted after moving
|
||||
the object into place but before freezeContent. Leaving an object file
|
||||
with possibly unsafe permissions. Looks like `git-annex fsck` will
|
||||
corrrect that, if it's run.
|
||||
|
||||
As you mentioned, when an unlocked file is added, and linkToAnnex
|
||||
is called, it does move the object into the annex before freezeContent.
|
||||
Although that may have been an oversight really. It could just as well
|
||||
freeze before moving and so avoid leaving the file with the wrong
|
||||
permissions when interrupted.
|
||||
|
||||
And there are other situations where being interrupted can have the same
|
||||
result. Eg, in lockContentForRemoval, it calls thawContent, then an action
|
||||
that may take long enough to be interrupted, and then freezeContent.
|
||||
And it's hard to see any other way that could work; it can't
|
||||
move the object out of the object directory before thawing it.
|
||||
|
||||
So, this seems ok, I suppose.
|
||||
|
||||
---
|
||||
|
||||
In Annex.Ingest, `lockDown'` calls freezeContent on the file
|
||||
when it's still in the work tree. So I think that would have the same
|
||||
problem you're trying to prevent with this patch?
|
||||
|
||||
Command.Import also has a call to freezeContent that is not on the final
|
||||
object file location.
|
||||
|
||||
A windows-specific feature like this risks getting broken, so maybe
|
||||
it would be good to change freezeContent to avoid such problems. Eg,
|
||||
it could be changed to take a Key, and freeze the object file
|
||||
for that Key. But at least the call in Annex.Ingest needs to happen
|
||||
before there is a Key.
|
||||
|
||||
So perhaps there should be a freezeContent
|
||||
and a separate freezeObject, which takes a Key. There could
|
||||
then be a separate annex.freezeobject-command that gets run only
|
||||
for freezeObject, not freezeContent.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="asakurareiko@f3d908c71c009580228b264f63f21c7274df7476"
|
||||
nickname="asakurareiko"
|
||||
avatar="http://cdn.libravatar.org/avatar/a865743e357add9d15081840179ce082"
|
||||
subject="comment 2"
|
||||
date="2021-10-26T19:54:53Z"
|
||||
content="""
|
||||
Sorry I missed explaining a few things and made a mistake in the patch. I made my freeze script detect whether the input is inside or outside of .git/annex/objects, so there are no problems with calling freezeContent on something in the working tree. The problem is not calling freezeContent on the final object, because the delete permission can only be denied at that point. The easiest way without compromising the safety of the previous behaviour is to add another freezeContent call after moveFile.
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-10-27T17:56:15Z"
|
||||
content="""
|
||||
Ah, making your script smart is reasonable enough.
|
||||
|
||||
I hope you might consider sharing the script in a tip?
|
||||
|
||||
Looking at your updated patch, you now leave the freezeContent call before it
|
||||
moves to the object file, and add another call afterwards. I think that would
|
||||
be objectionable if the user has a freeze hook that is expensive
|
||||
the run, because it would unncessarily run twice. I fairly well satisfied
|
||||
myself in comment #1 that it's ok to defer freezeContent to after it's
|
||||
moved the object file into place.
|
||||
|
||||
So, I've applied it, but modified to remove that earlier freezeContent.
|
||||
"""]]
|
|
@ -1,3 +0,0 @@
|
|||
Sqlite docs [say](https://www.sqlite.org/pragma.html#pragma_synchronous) "commits can be orders of magnitude faster with synchronous OFF". The downside is a chance of db corruption if power fails at a bad moment, but since git-annex's dbs can be re-generated from git data, maybe that's a tradeoff some users would be ok with? One usually knows when power has failed.
|
||||
|
||||
> [[closing|done]] per comments --[[Joey]]
|
|
@ -1,47 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-06-23T17:05:38Z"
|
||||
content="""
|
||||
I think it could at least use synchronous=NORMAL, entirely safely, since it
|
||||
uses WAL mode.
|
||||
|
||||
"WAL mode is always consistent with synchronous=NORMAL, but WAL mode does
|
||||
lose durability. A transaction committed in WAL mode with
|
||||
synchronous=NORMAL might roll back following a power loss or system crash."
|
||||
|
||||
It's certianly already possible for a power loss or ctrl-c while git-annex
|
||||
is running to cause database changes to be lost, since git-annex buffers
|
||||
several changes together into a transaction and until it sends that
|
||||
transaction, can lose the data.
|
||||
|
||||
Exactly how well git-annex recovers from that probably varies, eg
|
||||
Database.Keys.reconcileStaged flushes the transactions before updating its
|
||||
own state files, so on power loss it will just run again and recover. The
|
||||
fsck database gets recovered likewise. But there are probably other write points
|
||||
where getting the data recovered is harder.
|
||||
|
||||
For example, moveAnnex updates the inode cache at the end when it populated
|
||||
a pointer file. If that database write is lost, git-annex won't know that
|
||||
the pointer file is populated with annexed content. So it will treat it as
|
||||
a possibly modified unlocked file, and when it eventually has a reason to,
|
||||
will re-hash it, and then should recover the lost information.
|
||||
|
||||
Quite possible there are situations where it fails to recover the lost
|
||||
information and does something annoying. But like I said, such situations
|
||||
can already happen and setting synchronous=NORMAL does not make them more
|
||||
likely.
|
||||
|
||||
It would still make sense to benchmark it before changing to it. It may
|
||||
well be that git-annex's buffering of changes into larger transactions
|
||||
already has a similar performance gain as the pragma and that the pragma
|
||||
does not speed it up.
|
||||
|
||||
As far as OFF goes, I'd need to see some serious performance improvements
|
||||
in benchmarking, and also be sure that git-annex always recovered well,
|
||||
which would have to somehow include detecting corrupted sqlite databases
|
||||
and rebuilding them. I don't know if it's really possible to detect.
|
||||
Might some form of corrupted sqlite database cause sqlite, and thus
|
||||
git-annex, to crash? And rebuilding might entail re-hashing the entire
|
||||
repository, so very expensive.
|
||||
"""]]
|
|
@ -1,19 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="recovering from sqlite db corruption"
|
||||
date="2021-06-23T18:45:47Z"
|
||||
content="""
|
||||
>detecting corrupted sqlite databases and rebuilding them. I don't know if it's really possible to detect.
|
||||
|
||||
Could you detect whether a git-annex command finished normally,by creating a marker file when it starts, and deleting the marker file as the last thing before exiting?
|
||||
The next command then checks if the previous one crashed, and rebuilds the dbs if yes (or just warns the user and offers to rebuild.)
|
||||
|
||||
>Rebuilding might entail re-hashing the entire repository
|
||||
|
||||
Aren't all file hashes recorded in git, which would not be affected by a sqlite crash?
|
||||
|
||||
|
||||
|
||||
|
||||
"""]]
|
|
@ -1,102 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-06-23T19:15:55Z"
|
||||
content="""
|
||||
Benchmarked with NORMAL:
|
||||
|
||||
joey@darkstar:~/tmp/t>/usr/bin/time ~/git-annex.synchrousNORMAL add 1??? --quiet
|
||||
6.99user 5.09system 0:11.63elapsed 103%CPU (0avgtext+0avgdata 68356maxresident)k
|
||||
143760inputs+40352outputs (819major+404774minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/t>/usr/bin/time ~/git-annex.synchrousNORMAL add 2??? --quiet
|
||||
7.71user 5.15system 0:11.93elapsed 107%CPU (0avgtext+0avgdata 69876maxresident)k
|
||||
11336inputs+42648outputs (9major+414417minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/t>/usr/bin/time ~/git-annex.synchrousNORMAL add 3??? --quiet
|
||||
7.99user 5.16system 0:12.20elapsed 107%CPU (0avgtext+0avgdata 70452maxresident)k
|
||||
11952inputs+44200outputs (8major+415267minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/t>/usr/bin/time ~/git-annex.synchrousNORMAL add 4??? --quiet
|
||||
8.30user 5.25system 0:12.62elapsed 107%CPU (0avgtext+0avgdata 69496maxresident)k
|
||||
17784inputs+45776outputs (9major+416640minor)pagefaults 0swaps
|
||||
|
||||
Which is no improvement over git-annex with no pragmas. Actually slower.
|
||||
|
||||
joey@darkstar:~/tmp/t>/usr/bin/time ~/git-annex.orig add 1??? --quiet
|
||||
6.89user 5.36system 0:11.39elapsed 107%CPU (0avgtext+0avgdata 50576maxresident)k
|
||||
47064inputs+40352outputs (5616major+404472minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/u>/usr/bin/time git-annex add 2??? --quiet
|
||||
7.76user 5.09system 0:11.88elapsed 108%CPU (0avgtext+0avgdata 70848maxresident)k
|
||||
12776inputs+42648outputs (9major+414346minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/u>/usr/bin/time git-annex add 3??? --quiet
|
||||
7.90user 5.26system 0:12.14elapsed 108%CPU (0avgtext+0avgdata 71676maxresident)k
|
||||
13824inputs+44200outputs (8major+415258minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/u>/usr/bin/time git-annex add 4??? --quiet
|
||||
8.22user 5.38system 0:12.49elapsed 108%CPU (0avgtext+0avgdata 71652maxresident)k
|
||||
14216inputs+45776outputs (8major+416784minor)pagefaults 0swaps
|
||||
|
||||
OFF also benchmarks very close to the same.
|
||||
|
||||
joey@darkstar:~/tmp/v>/usr/bin/time ~/git-annex.synchrousOFF add 1??? --quiet
|
||||
6.85user 5.58system 0:12.01elapsed 103%CPU (0avgtext+0avgdata 71100maxresident)k
|
||||
50080inputs+40352outputs (16major+405312minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/v>/usr/bin/time ~/git-annex.synchrousOFF add 2??? --quiet
|
||||
7.64user 5.31system 0:11.96elapsed 108%CPU (0avgtext+0avgdata 71392maxresident)k
|
||||
12672inputs+42640outputs (8major+414373minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/v>/usr/bin/time ~/git-annex.synchrousOFF add 3??? --quiet
|
||||
8.02user 5.15system 0:12.19elapsed 108%CPU (0avgtext+0avgdata 71556maxresident)k
|
||||
11648inputs+43928outputs (8major+415140minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/v>/usr/bin/time ~/git-annex.synchrousOFF add 4??? --quiet
|
||||
8.24user 5.24system 0:12.41elapsed 108%CPU (0avgtext+0avgdata 71224maxresident)k
|
||||
10952inputs+45304outputs (8major+416560minor)pagefaults 0swaps
|
||||
|
||||
One pass did run 0.08s faster, could be due to not syncing but it does
|
||||
not seem a significant optimisation, at least not on this SSD.
|
||||
|
||||
Should be noted that transactions build up 1000 changes, and that benchmark
|
||||
was operating on 1000 files per run, so it probably only wrote one or two
|
||||
transactions.
|
||||
|
||||
Here's the patch that adds a pragma:
|
||||
|
||||
diff --git a/Database/Handle.hs b/Database/Handle.hs
|
||||
index d7f1822dc..2d66af5e6 100644
|
||||
--- a/Database/Handle.hs
|
||||
+++ b/Database/Handle.hs
|
||||
@@ -1,11 +1,11 @@
|
||||
{- Persistent sqlite database handles.
|
||||
-
|
||||
- - Copyright 2015-2019 Joey Hess <id@joeyh.name>
|
||||
+ - Copyright 2015-2021 Joey Hess <id@joeyh.name>
|
||||
-
|
||||
- Licensed under the GNU AGPL version 3 or higher.
|
||||
-}
|
||||
|
||||
-{-# LANGUAGE TypeFamilies, FlexibleContexts #-}
|
||||
+{-# LANGUAGE TypeFamilies, FlexibleContexts, OverloadedStrings #-}
|
||||
|
||||
module Database.Handle (
|
||||
DbHandle,
|
||||
@@ -34,6 +34,7 @@ import qualified Data.Text as T
|
||||
import Control.Monad.Trans.Resource (runResourceT)
|
||||
import Control.Monad.Logger (runNoLoggingT)
|
||||
import System.IO
|
||||
+import Lens.Micro
|
||||
|
||||
{- A DbHandle is a reference to a worker thread that communicates with
|
||||
- the database. It has a MVar which Jobs are submitted to. -}
|
||||
@@ -194,10 +195,13 @@ runSqliteRobustly tablename db a = do
|
||||
maxretries = 100 :: Int
|
||||
|
||||
rethrow msg e = throwIO $ userError $ show e ++ "(" ++ msg ++ ")"
|
||||
-
|
||||
+
|
||||
+ conninfo = over extraPragmas (const ["PRAGMA synchronous=OFF"]) $
|
||||
+ mkSqliteConnectionInfo db
|
||||
+
|
||||
go conn retries = do
|
||||
r <- try $ runResourceT $ runNoLoggingT $
|
||||
- withSqlConnRobustly (wrapConnection conn) $
|
||||
+ withSqlConnRobustly (wrapConnectionInfo conninfo conn) $
|
||||
runSqlConn a
|
||||
case r of
|
||||
Right v -> return v
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="thanks "
|
||||
date="2021-06-24T16:40:48Z"
|
||||
content="""
|
||||
Thanks for doing the benchmarking; seems like git-annex's batching of operations already captures whatever speedup de-synchronizing could give.
|
||||
|
||||
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
Like git annex runs git-annex, git-annex foo could run git-annex-foo when
|
||||
it's not built-in.
|
||||
|
||||
One user of this would be annex-review-unused, which
|
||||
its author would rather name git-annex-reviewunused if that
|
||||
made "git annex reviewunused" work.
|
||||
|
||||
In CmdLine, where autocorrect is handled, it would need to
|
||||
search the path for all "git-annex-" commands and then
|
||||
either dispatch the one matching the inputcmdname,
|
||||
or do autocorrect with the list of those commands
|
||||
included along with the builtins. --[[Joey]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,43 +0,0 @@
|
|||
Currently annex.thin needs hard link support to be efficient;
|
||||
it hard links the content from .git/annex/objects into the work tree.
|
||||
When hard links are not supported, two copies of checked out files exist on
|
||||
disk.
|
||||
|
||||
Would it be possible to make it work w/o hard links? Note that direct mode
|
||||
does avoid two copies of files.
|
||||
|
||||
IIRC the main reason for the hard link is so, when git checkout deletes a
|
||||
work tree file, the only copy of the file is not lost. Seems this would
|
||||
need a git hook run before checkout to rescue such files.
|
||||
|
||||
Also some parts of git-annex's code, including `withObjectLoc`, assume
|
||||
that the .annex/objects is present, and so it would need to be changed
|
||||
to look at the work tree file. --[[Joey]]
|
||||
|
||||
> Git hook is not sufficient. Consider the case of "rm file; git checkout file"
|
||||
> Without hard links, if the only copy of the annex object was in that
|
||||
> deleted file, it can't be restored. Now, direct mode did have the same
|
||||
> problem, but it didn't support `git checkout`, so the user didn't have
|
||||
> reason to expect such a workflow to work.
|
||||
>
|
||||
> So, I think this is not possible to implement in a way that won't
|
||||
> lead to users experiencing data loss when using it and doing
|
||||
> perfectly normal git things like this.
|
||||
>
|
||||
> (Although to be fair, annex.thin has its own data loss scenarios,
|
||||
> involving modifying a file potentially losing the only copy of
|
||||
> the old version. The difference, I think, is that with it,
|
||||
> you modify the file yourself and so lose the old version; the data
|
||||
> loss does not happen when you run git checkout or git pull!)
|
||||
>
|
||||
> In the meantime,
|
||||
> git-annex has gotten support for directory special remotes with
|
||||
> import/export tree. This can be used instead, for use cases such as a
|
||||
> device with a FAT filesystem. The git-annex repo can live on another
|
||||
> filesystem that does support hard links or symlinks, or where using
|
||||
> double disk space is not as much of a problem, or can even be a bare
|
||||
> git repo. That syncs up with the FAT device through tree import and
|
||||
> export. Once content has been imported to the git-annex repo,
|
||||
> the user can delete files from the FAT device without losing data.
|
||||
>
|
||||
> So this seems about as good as it can get. [[done]] --[[Joey]]
|
|
@ -1,38 +0,0 @@
|
|||
Add a git config to limit the bandwidth of transfers to/from remotes.
|
||||
|
||||
rsync has --bwlimit, so used to work, but is not used with modern
|
||||
git-annex for p2p transfers. (bup also has a --bwlimit)
|
||||
|
||||
This should be possible to implement in a way that works for any remote
|
||||
that streams to/from a bytestring, by just pausing for a fraction of a
|
||||
second when it's running too fast. The way the progress reporting interface
|
||||
works, it will probably work to put the delay in there. --[[Joey]]
|
||||
|
||||
[[confirmed]]
|
||||
|
||||
> Implemented and works well. [[done]] --[[Joey]]
|
||||
|
||||
> Note: A local git remote, when resuming an interrupted
|
||||
> transfer, has to hash the file (with default annex.verify settings),
|
||||
> and that hashing updates the progress bar, and so the bwlimit can kick
|
||||
> in and slow down that initial hashing, before any data copying begins.
|
||||
> This seems perhaps ok; if you've bwlimited a local git remote,
|
||||
> remote you're wanting to limit disk IO. Only reason it might not be ok
|
||||
> is if the intent is to limit IO to the disk containing the remote
|
||||
> but not the one containing the annex repo. (This also probably
|
||||
> holds for the directory special remote.)
|
||||
> Other remotes, including git over ssh, when resuming don't have that
|
||||
> problem. Looks like chunked special remotes narrowly avoid it, just
|
||||
> because their implementation choose to not do incremental verification
|
||||
> when resuming. It might be worthwhile to differentiate between progress
|
||||
> updates for incremental verification setup and for actual transfers, and
|
||||
> only rate limit the latter, just to avoid fragility in the code.
|
||||
> I have not done so yet though, and am closing this..
|
||||
> --[[Joey]]
|
||||
|
||||
> (One other small caveat is that it pauses after each chunk, which means
|
||||
> it pauses unncessarily after the last chunk of the file. It doesn't know
|
||||
> it's the last chunk, and it would be hard to teach it. And the chunks
|
||||
> tend to be 32kb or so, and the pauses a small fraction of a second. So
|
||||
> mentioning this only for completeness.) --[[Joey]]
|
||||
|
|
@ -1,7 +0,0 @@
|
|||
AFAICT, the `annex/` subdir in a bare annex repo is the exact same layout as a directory special remote.
|
||||
|
||||
It'd be very useful if its parameters could be customised just like an actual directory special remote to allow for e.g. encrypted and/or chunked storage. I have a use-case where this could significantly simplify things.
|
||||
|
||||
An interesting side-effect of this would be a tweakable location for a bare repo's storage which could be used to separate metadata and data (i.e. git repo on SSD for fast syncs and actual data on an HDD).
|
||||
|
||||
> [[rejected|done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 1"
|
||||
date="2021-05-25T16:48:26Z"
|
||||
content="""
|
||||
You can already do this by setting the `remote.<name>.annex-ignore` config option for the bare repo and initializing an independent directory special-remote.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Atemu"
|
||||
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
|
||||
subject="comment 2"
|
||||
date="2021-05-26T07:11:20Z"
|
||||
content="""
|
||||
The problem is that I need this repo to stay a remote from the eyes of all other repos; I need to be able to get files from and add new ones to it. I just want its storage back-end to work a little differently so that it fits my use-case.
|
||||
"""]]
|
|
@ -1,22 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-05-27T16:15:31Z"
|
||||
content="""
|
||||
It would not make sense for a non-bare git repository to have annexed
|
||||
contents in it encrypted or chunked, because that would prevent actually
|
||||
accessing the annexed files at all; git-annex symlinks have to point to a
|
||||
complete, non-encrypted file.
|
||||
|
||||
Bare git repositories are a very minor special case of non-bare git
|
||||
repositories; they do not have a work tree or index. In other
|
||||
respected, they are the same, and it's entirely possible to manually
|
||||
convert a git repo to or from bare, or even temporarily use a bare repo
|
||||
with a work tree.
|
||||
|
||||
It would be extremely inelegant if git-annex did something that broke
|
||||
that. Which this would.
|
||||
|
||||
I think you should use a rsync special remote possibly. Which also has the
|
||||
same layout as a directory special remote.
|
||||
"""]]
|
|
@ -1,5 +0,0 @@
|
|||
When you want to dead a file in your checkout, you can only do so via the key of the file. You can find the corresponding key with a bit of bash like this: `git annex dead --key $(basename $(readlink file))` but that shouldn't be necessary IMO.
|
||||
|
||||
It'd be a lot better if you could just dead files like this: `git annex dead --file file` or even like this: `git annex dead --file file1 file2 file3 otherfiles.*` (or maybe even like this: `git annex dead --file file1 file2 --key $key1 $key2`).
|
||||
|
||||
> [[done]] in another way --[[Joey]]
|
|
@ -1,25 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-05-31T19:08:32Z"
|
||||
content="""
|
||||
I suppose this could be useful, but note that `git annex fsck` without
|
||||
--all will still warn if it finds a file in the working tree with no
|
||||
existing content, even if its key has been marked dead. Because having a
|
||||
file in the working tree that you can't get is certainly a bad situation.
|
||||
|
||||
So, if this feature got implemented, you would want to follow `git annex
|
||||
dead` of a file with `git rm` of the file. Probably.
|
||||
|
||||
The other reason dead only operates on keys is that the expected
|
||||
workflow was that the user will lose data, will delete the lost file out of
|
||||
their working tree, or overwrite it or whatever, and then at some later
|
||||
point get annoyed that fsck --all complains about it, and so then mark it
|
||||
dead. But if you want to be proactive, marking a file dead is certainly
|
||||
useful to be able to do.
|
||||
|
||||
I'd also be concerned that `git annex dead` or `git annex dead .` run
|
||||
accidentally could be an annoying mistake to recover from. Certianly
|
||||
it should not default to marking all files dead when there are no
|
||||
parameters!
|
||||
"""]]
|
|
@ -1,26 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-06-25T18:22:04Z"
|
||||
content="""
|
||||
In <https://git-annex.branchable.com/forum/Forget_about_accidentally_added_file__63__/>
|
||||
there is an idea of `git annex unannex --forget file`
|
||||
|
||||
And using unannex for this makes some sense; it's intended to be used to undo an
|
||||
accidental `git-annex add`. When it's used that way, and later `git-annex
|
||||
unused` finds the object file is not used by anything and the object gets
|
||||
deleted, fsck --all will start complaining about it.
|
||||
|
||||
But there are still many ways it could go wrong. Being run recursively by
|
||||
accident. Or another file, perhaps in another branch, using the same key,
|
||||
which gets marked as dead.
|
||||
|
||||
Hmm, `git annex dropunused` (or `drop --unused`)
|
||||
could mark the key as dead. At that point it's known to be unused.
|
||||
|
||||
This way, the existing workflow of git-annex unannex followed by git-annex
|
||||
unused followed by dropping can be followed, and fsck --all does
|
||||
not later complain about the key.
|
||||
|
||||
Done!
|
||||
"""]]
|
|
@ -1,38 +0,0 @@
|
|||
Consider this, where branch foo has ten to a hundred thousand files
|
||||
not in the master branch:
|
||||
|
||||
git checkout foo
|
||||
touch newfile
|
||||
git annex add newfile
|
||||
|
||||
After recent changes to reconcileStaged, the result can be:
|
||||
|
||||
add newfile 0b 100% # cursor sits here for several seconds
|
||||
|
||||
This is because it has to look in the keys db to see if there's an
|
||||
associated file that's unlocked and needs populating with the content of
|
||||
this newly available key, so it does reconcileStaged, which can take some
|
||||
time.
|
||||
|
||||
One fix would be, if reconcileStaged is taking a long time, make it display
|
||||
a note about what it's doing:
|
||||
|
||||
add newfile 0b 100% (scanning annexed files...)
|
||||
|
||||
It would also be possible to do the scan before starting to add files,
|
||||
which would look more consitent and would avoid it getting stuck
|
||||
with the progress display in view:
|
||||
|
||||
(scanning annexed files...)
|
||||
add newfile ok
|
||||
|
||||
> [[done]] --[[Joey]]
|
||||
|
||||
It might also be possible to make reconcileStaged run a less expensive
|
||||
scan in this case, eg the scan it did before
|
||||
[[!commit 428c91606b434512d1986622e751c795edf4df44]]. In this case, it
|
||||
only really cares about associated files that are unlocked, and so
|
||||
diffing from HEAD to the index is sufficient, because the git checkout
|
||||
will have run the smudge filter on all the unlocked ones in HEAD and so it
|
||||
will already know about those associated files. However, I can't say I like
|
||||
this idea much because it complicates using the keys db significantly.
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-06-08T15:21:02Z"
|
||||
content="""
|
||||
Made `git-annex smudge --update` run the scan, and so the post-checkout or
|
||||
post-merge hook will call it.
|
||||
|
||||
That avoids the scenario shown above. But adding a lot of files to the
|
||||
index can still cause a later pause for reconcileStaged without indication
|
||||
what it's doing.
|
||||
"""]]
|
|
@ -1,22 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-06-08T16:03:13Z"
|
||||
content="""
|
||||
I tried making reconcileStaged display the message itself, this is the
|
||||
result:
|
||||
|
||||
add foo
|
||||
100% 30 B 73 KiB/s 0s(scanning for annexed files...)
|
||||
ok
|
||||
|
||||
So for that to be done, showSideAction would need to clear the progress
|
||||
bar display first. Note that the display is ok when concurrent output is
|
||||
enabled:
|
||||
|
||||
add c (scanning for annexed files...)
|
||||
ok
|
||||
|
||||
Ok.. Fixed that display glitch, and made reconcileStaged display
|
||||
the message itself when it's taking a while to run.
|
||||
"""]]
|
|
@ -1,32 +0,0 @@
|
|||
The protocol has `GETCONFIG`, which gives access to the configuration
|
||||
stored in remote.log, but it does not provide a good way to access git
|
||||
configs set on the remote.
|
||||
|
||||
Datalad uses `GETCONFIG name` to get the remote name, and
|
||||
then using git config to get its configs. That is suboptimal
|
||||
because sameas remotes use sameas-name instead, and also because
|
||||
the two names are not necessarily the same, eg `git remote rename` can
|
||||
rename the git remote while the git-annex config still uses the other name.
|
||||
<https://github.com/datalad/datalad/issues/4259>
|
||||
|
||||
One way to do that is `GETUUID` and then look for the git remote with
|
||||
annex-uuid set to that, in order to learn its name and then find its other git
|
||||
configs. But, it's also possible for there to be multiple git remotes with the
|
||||
same annex-uuid. (This does not happen with sameas remotes, but like a git repo
|
||||
can have multiple remotes pointing to it by different paths, the same can be
|
||||
set up for a special remote, at least in theory.)
|
||||
|
||||
So, the protocol should be extended. Either with a way to get/set a single git
|
||||
config (like `GETCONFIG`/`SETCONFIG` do with the remote.log config), or with a
|
||||
way to get the git remote name.
|
||||
|
||||
The latter has the problem that this business of there being multiple
|
||||
names for different related things that might be different but are probably
|
||||
the same is a perhaps not something people want to learn about.
|
||||
|
||||
The former seems conceptually simpler, but there might be things that
|
||||
`git config` could do, that providing an interface on top of it would not
|
||||
allow. The --type option is one thing that comes to mind. --[[Joey]]
|
||||
|
||||
> [[done]] as the GETGITREMOTENAME protocol extension and message.
|
||||
> --[[Joey]]
|
|
@ -1,7 +0,0 @@
|
|||
`git annex fsck` currently spams the terminal with all keys in a repo and prints `git-annex: fsck: n failed` at the end if errors occur. Finding these errors in a sea of `ok`s is not trivial however.
|
||||
|
||||
A simple solution to this could be an fsck option which skips printing ok'd (and perhaps also dead) keys, i.e. `--no-ok` and `--no-dead`.
|
||||
|
||||
[[!meta title="mention common options on per-command man pages"]]
|
||||
|
||||
> common option man page and references [[done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 1"
|
||||
date="2021-05-10T12:21:37Z"
|
||||
content="""
|
||||
Just use the `--quiet` option, then it will only show the errors (failed files/keys).
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Atemu"
|
||||
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
|
||||
subject="comment 2"
|
||||
date="2021-05-10T14:13:55Z"
|
||||
content="""
|
||||
Thanks, that's exactly what I'm looking for!
|
||||
|
||||
It's not in the git-annex-fsck manpage though for some reason.
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-05-10T15:07:06Z"
|
||||
content="""
|
||||
Normally the common options are not included in every command's man page
|
||||
because there are over 100 lines of them. However, I do think it's worth
|
||||
including --quiet on fsck's man page in this specific case and am doing
|
||||
that.
|
||||
|
||||
Maybe individual command man pages should mention that there are
|
||||
also a bunch of common options. Perhaps those should be split out of the
|
||||
git-annex man page, like the git-annex-matching-options man page is
|
||||
handled.
|
||||
"""]]
|
|
@ -1,55 +0,0 @@
|
|||
ATM `annex get` (in particular '--json --json-error-messages --json-progress') would channel to the user the error from an attempt to get a key from a remote with a message which lacks information about remote and/or specifics of that particular attempt (e.g. which URL was attempted from web remote), e.g.
|
||||
|
||||
```
|
||||
$> git clone https://github.com/dandisets/000029 && cd 000029
|
||||
Cloning into '000029'...
|
||||
remote: Enumerating objects: 326, done.
|
||||
remote: Counting objects: 100% (326/326), done.
|
||||
remote: Compressing objects: 100% (160/160), done.
|
||||
remote: Total 326 (delta 137), reused 295 (delta 106), pack-reused 0
|
||||
Receiving objects: 100% (326/326), 45.53 KiB | 1.30 MiB/s, done.
|
||||
Resolving deltas: 100% (137/137), done.
|
||||
dandiset.yaml sub-RAT123/ sub-anm369962/ sub-anm369963/ sub-anm369964/
|
||||
|
||||
$> git update-ref refs/remotes/origin/git-annex b822a8d40ff348a60602f13d0add989bd24e727a # URLs fixed since then
|
||||
|
||||
$> git annex get sub-RAT123
|
||||
get sub-RAT123/sub-RAT123.nwb (from web...)
|
||||
|
||||
download failed: Not Found
|
||||
|
||||
ok
|
||||
(recording state in git...)
|
||||
|
||||
$> git annex version | head -n 1
|
||||
git-annex version: 8.20210803+git165-g249d424b8-1~ndall+1
|
||||
```
|
||||
|
||||
NB. That "download failed: Not Found" is also channeled in that form (without any extra information) among "errors" of `--json-error-messages` (and each progress message within `--json-progress`)
|
||||
|
||||
As such the message is not informative really, and might even be a bit confusing to the user since `get` does `ok` eventually here.
|
||||
I think it is useful to channel such information but it should be extended, e.g. in this case could be
|
||||
|
||||
```
|
||||
failed to retrieve content from 'web' remote: https://api.dandiarchive.org/api/dandisets/000029/versions/draft/assets/b3675aad-db07-4fd4-9cce-c95f1184e7a3/download/ - Not Found
|
||||
```
|
||||
|
||||
or alike. Even though considerably longer, it immediately provides feedback from which remote it failed to retrieve, and what was that particular URL.
|
||||
|
||||
|
||||
refs in DataLad issues:
|
||||
|
||||
- from web remote: ["download failed: Not Found"](https://github.com/datalad/datalad/pull/5936)
|
||||
- from ["failed to retrieve content from remote"](https://github.com/datalad/datalad/issues/5750)
|
||||
|
||||
> I think this is specific to downloading urls, although it can happen
|
||||
> for a few remotes (web, external). There's really no reason to display
|
||||
> a download failed message if it successfully downloads a later url.
|
||||
> (After all, if it had tried the working url first, it would never display
|
||||
> anything about the broken url.)
|
||||
>
|
||||
> When all urls fail, it makes sense to display each url and why it failed
|
||||
> when using the web (or external) remote, so the user can decide what to
|
||||
> do about each of the problems.
|
||||
>
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,3 +0,0 @@
|
|||
Can git-annex-get be extended so that "git-annex-get --batch --key" fetches the keys (rather than filenames) given in the input?
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-09-18T17:07:56Z"
|
||||
content="""
|
||||
--key can't be reused for another meaning like this, it would make "--key
|
||||
foo" be ambiguous.
|
||||
|
||||
It would need to be some other option, --batch-key or whatever.
|
||||
|
||||
Adding this would seem to open the door to adding it to every command that
|
||||
supports --batch now. I'm unsure if the added complexity justifies it.
|
||||
|
||||
I'd be more sanguine if there were a way to reuse the existing batch
|
||||
machinery and apply it to keys. But many commands' --batch honor file
|
||||
matching options (eg --copies or --include), and that cannot be done when
|
||||
using keys.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://christian.amsuess.com/chrysn"
|
||||
nickname="chrysn"
|
||||
avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc"
|
||||
subject="Usefulness of batch key processing"
|
||||
date="2020-05-15T09:21:15Z"
|
||||
content="""
|
||||
This would be quite helpful to tools using git-annex (eg. [annex-to-web](https://gitlab.com/chrysn/annex-to-web), issue [2](https://gitlab.com/chrysn/annex-to-web/-/issues/2)), especially for short-running things like `whereis` where the launching time dominates over the processing time.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://christian.amsuess.com/chrysn"
|
||||
nickname="chrysn"
|
||||
avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc"
|
||||
subject="Re: Usefulness of batch key processing"
|
||||
date="2020-05-15T09:33:22Z"
|
||||
content="""
|
||||
Concerning the filtering, I'd find a note that \"--batch-keys is mutually exclusive with filtering\" perfectly acceptable if that makes implementation easier. (Or \"only with the filtering options that apply to keys\" -- as I found that `git annex whereis --in web --key=...` does work well with the key input).
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://christian.amsuess.com/chrysn"
|
||||
nickname="chrysn"
|
||||
avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc"
|
||||
subject="Another example"
|
||||
date="2021-08-15T17:42:54Z"
|
||||
content="""
|
||||
The program at [[forum/Migrate_mark_files_dead]] shows again how batch-key would be useful, here for `git annex drop --from remote` and `git annex dead`.
|
||||
|
||||
I don't have numbers as I can't run it in batch, but comparing to other multi-file batch drop operations, I guesstimate this makes the difference of a script running for an hour invoking git-annex-drop a thousand times (with interruptions if the SSH agent decides to ask confirmation for a key again) or five minutes with --batch-key.
|
||||
|
||||
Like with the original use case of annex-to-web, filtering is not an issue for this application.
|
||||
"""]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2021-08-25T18:06:29Z"
|
||||
content="""
|
||||
I've implemented --batch-keys for the commands: get, drop, move, copy, whereis
|
||||
|
||||
That covers everything mentioned here except for dead, but that does not
|
||||
support --batch yet, so if batch mode is needed for it, it can just use
|
||||
--batch, not --batch-keys. However, after a recent change that makes
|
||||
dropping unused keys automatically mark them dead, I suspect there
|
||||
will not be a use case for that.
|
||||
|
||||
Most of the other commands that use --batch don't make sense to support
|
||||
--batch-keys. Eg, add and find can't operate on keys, while
|
||||
fromkey already operates on keys. About the only one that might is
|
||||
rmurl, but it uses a custom batch format so would not be able to use the
|
||||
current --batch-keys implementation. If someone needs that or some other
|
||||
one, they can open a new todo.
|
||||
"""]]
|
|
@ -1,86 +0,0 @@
|
|||
Files in the git-annex branch use timestamps to ensure that the most
|
||||
recently recorded state wins. This is unsatisfying, because it requires
|
||||
accurate clocks amoung all users. It would be better to use vector clocks,
|
||||
where possible, but it is not possible to use vector clocks for all
|
||||
information in the branch.
|
||||
|
||||
To see why vector clocks can't be used for some information in the branch,
|
||||
consider location log files. They are meant to reflect the actual state of
|
||||
an external resource. Vector clocks can ensure that a consistent state is
|
||||
agreed on by distributed users, but there's no way to guarantee that state
|
||||
matches the actual state.
|
||||
|
||||
For example, let's assume there's a vector clock consisting of an an
|
||||
integer, and an object is being added and removed from a remote by multiple
|
||||
parties. First Alice logs (present, 1), and then some time later, Alice
|
||||
logs (missing, 2). Meanwhile, Bob merges (present, 1) from Alice
|
||||
and then logs (missing, 2), followed by (present, 3). At some later point,
|
||||
they merge back up, and the winning state is (present, 3) as it has the
|
||||
highest vector clock. Is the content really present on the remote?
|
||||
Well, we don't know, Alice could have removed it before Bob stored it,
|
||||
or afterwards.
|
||||
|
||||
But, other information in the branch could use vector clocks. Consider
|
||||
numcopies setting. It's fine if the winner of a conflict over that is not
|
||||
the one who set it most recently, as long as a value can be consistently
|
||||
determined. So, the numcopies setting, and similar other configuration, is not
|
||||
trying to track an external state, and so it could use vector clocks.
|
||||
|
||||
How would these vector clocks work, and how to transition to using them
|
||||
without confusing old versions of git-annex that expect timestamps? A
|
||||
change to a log could simply increment the clock from the previous
|
||||
version of the log. This would make the new git-annex normally lose
|
||||
when a conflicting change was written by an old git-annex, but the result
|
||||
would be consistent, so that's acceptable.
|
||||
|
||||
Files that are related to external state need to continue to use
|
||||
timestamps. But this could still be improved. Currently, if the clock is
|
||||
wronly set far in the future, logs using those timestamps will win over
|
||||
other logs for a long time. This could break git-annex badly as there
|
||||
becomes no way to correct wrong information.
|
||||
|
||||
Experimenting with `GIT_ANNEX_VECTOR_CLOCK`, it looks like `git annex fsck`
|
||||
is able to recover from wrong location information being recorded with a
|
||||
far future timestamp. It replaces that timestamp with the current one.
|
||||
However, if that then gets union merged with a change to the same location
|
||||
log made in another repository, fsck's correction can be lost in the merge.
|
||||
Re-running the fsck will eventually get the information corrected, once a
|
||||
non-union merge happens. However, `git annex fsck` can't correct other
|
||||
logs, like remote state logs, if they end up with bad information with
|
||||
a far future timestamp.
|
||||
|
||||
There's a mirror problem of information being recorded with a timestamp
|
||||
in the past and being ignored. But, at least in that case, re-recording
|
||||
good information with the right timestamp will fix the problem.
|
||||
|
||||
Consider making git-annex ignore future timestamps
|
||||
(with some amount of allowance for minor lock skew). There are two
|
||||
problems, one is that currently valid information gets ignored, until it's
|
||||
able to be re-recorded. The second is that when the timestamp slips
|
||||
into the past, the old, invalid information suddenly starts being taken
|
||||
into account.
|
||||
|
||||
---
|
||||
|
||||
A better idea: When writing new information, check if the old
|
||||
value for the log has a timestamp `>=` current timestamp. If so, don't use the
|
||||
current timestamp for the new information, instead increment the old
|
||||
timestamp. So when there's clock skew (forwards or backwards), this makes
|
||||
it fall back, effectively to vector clocks.
|
||||
|
||||
This would work for both kinds of logs. For configuration changes,
|
||||
it's kind of better than using only vector clocks, because in the absence
|
||||
of clock skew, the most recent change to a configuration wins. For state
|
||||
changes, it keeps the benefits of timestamps except when there's clock
|
||||
skew, in which case there are not any benefits of timestamps anymore
|
||||
so vector clocks is the best that can be done. --[[Joey]]
|
||||
|
||||
(How would `GIT_ANNEX_VECTOR_CLOCK` interact with this? Maybe, when that's
|
||||
set to a low number, it would be treated as the current time. So this would
|
||||
let it be used and not, without issues, and also would let it be set to a
|
||||
low number once, and not need to be changed, since git-annex would
|
||||
increment as necessary.)
|
||||
|
||||
> The `vectorclock` branch has this mostly implemented. --[[Joey]]
|
||||
|
||||
> > [[done]] --[[Joey]]
|
|
@ -1,44 +0,0 @@
|
|||
`git annex whereused` would report where in the git repository a
|
||||
key is used, as a complement to `git-annex unused`.
|
||||
|
||||
Use cases include users not getting confused about why git-annex unused
|
||||
says a key is used.
|
||||
|
||||
Also, it could scan through history to find where a key *was* used.
|
||||
git-annex unused outputs a suggestion to use a rather hairy `git log -S`
|
||||
command to do that currently.
|
||||
|
||||
If it does both these things, it could explain why git-annex unused
|
||||
considers a key used despite a previous git rev referring to it. Eg:
|
||||
|
||||
# git annex whereused SHA1--foo
|
||||
checking index... unused
|
||||
checking branches... unused
|
||||
checking tags... unused
|
||||
checking history... last used in master^40:somefile
|
||||
checking reflog... last used in HEAD@{30}:somefile
|
||||
|
||||
--[[Joey]]
|
||||
|
||||
> First pass is a keys db lookup to filenames.
|
||||
>
|
||||
> The historical pass can be done fairly efficiently by using
|
||||
> `git log -Skey --exclude=*/git-annex --glob=* --exclude=git-annex --remotes=* --tags=* --pretty='%H' --raw`
|
||||
> and fall back to `git log -Skey --walk-reflogs --pretty='%gd' --raw` if nothing was found.
|
||||
>
|
||||
> That makes git log check all commits reachable from those refs,
|
||||
> probably as efficiently as possible, and stop after one match.
|
||||
> It won't allow quite as nice a display as above.
|
||||
>
|
||||
> Parse the log output for commit sha and filename. Double-check
|
||||
> by catting the file's object and making sure it parses as an annex
|
||||
> link or pointer.
|
||||
>
|
||||
> Then use `git describe --contains --all` to get a description of the commit
|
||||
> sha, which will be something like "master~2" or "origin/master~2",
|
||||
> and add ":filename" to get the ref to output.
|
||||
>
|
||||
> Or, if it was found in the ref log, take the "HEAD@{n}" from log
|
||||
> output, and add ":filename"
|
||||
|
||||
[[done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
From [another thread](https://git-annex.branchable.com/todo/add_option_to_use_sqlite__39__s_synchronous__61__OFF/#comment-dbc9fdf5fd6d73f3e628bfe94b2a43a2):
|
||||
|
||||
>Quite possible there are situations where it fails to recover the lost information and does something annoying. But like I said, such situations can already happen
|
||||
|
||||
Maybe, there are some simple ways to harden git-annex against possible weirdness following abrupt interruptions? E.g. using flag files to detect when a prior operation got interrupted,
|
||||
and rebuilding the sqlite dbs from git data. Or tagging sqlite records with the timestamp of their creation, and not using the data if the relevant worktree files got modified since then.
|
||||
|
||||
> [[closing|done]] per comment --[[Joey]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-06-25T16:06:59Z"
|
||||
content="""
|
||||
I could make that statement about basically any program. I think git-annex
|
||||
deals with interruptions well. It is written with idempotency in mind. I
|
||||
interrupt it all the time. It always behaves well. That is not a proof that
|
||||
there is not some unforseen situation where I have made a mistake.
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
If a tree containing a non-annexed file (checked directly into git) is exported,
|
||||
and then an import is done from the remote, the new tree will have that
|
||||
file annexed, and so merging it converts to annexed (there is no merge
|
||||
conflict).
|
||||
|
||||
If the user is using annex.largefiles to configure or list
|
||||
the non-annexed files, they'll be ok, but otherwise they'll be in for some
|
||||
pain.
|
||||
|
||||
The importer could check for each file, if there's a corresponding file in
|
||||
the branch it's generating the import for, if that file is annexed.
|
||||
This corresponds to how git-annex add (and the smudge filter) handles these
|
||||
files. But this might be slow when importing a large tree of files.
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-03-05T16:38:25Z"
|
||||
content="""
|
||||
This leads to worse behavior than just converting to annexed from
|
||||
non-annexed. The converted file's contents don't verify due to some
|
||||
confusion between git and git-annex's use of SHA1. See
|
||||
<https://git-annex.branchable.com/forum/__96__git_annex_import__96___from_directory_loses_contents__63__/>
|
||||
"""]]
|
|
@ -1,30 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-03-05T16:42:03Z"
|
||||
content="""
|
||||
> The importer could check for each file, if there's a corresponding file in the branch it's generating the import for, if that file is annexed.
|
||||
|
||||
Should it check the branch it's generating the import for though?
|
||||
If the non-annexed file is "foo" and master is exported, then in master
|
||||
that file is renamed to "bar", the import should not look at the new master
|
||||
to see if the "foo" from the remote should be annexed. The correct tree
|
||||
to consult would be the tree that was exported to the remote last.
|
||||
|
||||
It seems reasonable to look at the file in that exported tree to see it was
|
||||
non-annexed before, and if the ContentIdentifier is the same as what
|
||||
was exported before, keep it non-annexed on import. If the ContentIdentifier
|
||||
has changed, apply annex.largefiles to decide whether or not to annex it.
|
||||
|
||||
The export database stores information about that tree already,
|
||||
but it does not keep track of whether a file was exported annexed or not.
|
||||
So changing the database to include an indication of that, and using it
|
||||
when importing, seems like a way to solve this problem, and without slowing
|
||||
things down much.
|
||||
|
||||
*Alternatively* the GitKey that git-annex uses for these files when
|
||||
exporting is represented as a SHA1 key with no size field. That's unusual;
|
||||
nothing else creates such a key usually. (Although some advanced users may
|
||||
for some reason.) Just treating such keys as non-annexed files when
|
||||
importing would be at least a bandaid if not a real fix.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-03-05T17:31:32Z"
|
||||
content="""
|
||||
Wait... The import code has a separate "GIT" key type that it uses
|
||||
internally once it's decided a file should be non-annexed. Currently
|
||||
that never hits disk. Using that rather than a SHA1 key for the export
|
||||
database could be a solution.
|
||||
|
||||
(Using that rather than "SHA1" for the keys would also avoid
|
||||
the problem that the current GitKey hardcods an assumption
|
||||
that git uses sha1..)
|
||||
"""]]
|
|
@ -1,32 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-03-05T17:44:54Z"
|
||||
content="""
|
||||
In fact, a very simple patch that just makes a GitKey generate a
|
||||
"GIT" key seems to have solved this problem! Files that were non-annexed
|
||||
on export remain so on import, until they're changed, and then
|
||||
annex.largefiles controls what happens.
|
||||
|
||||
Once non-annexed files have been exported using the new version, they'll
|
||||
stay non-annexed on import. Even when an old version of git-annex is doing
|
||||
the importing!
|
||||
|
||||
When an old annex had exported, and a new one imports, what happens is
|
||||
the file gets imported as an annexed file. Exporting first with the new
|
||||
version avoids that unwanted conversion.
|
||||
|
||||
Interestingly though, the annexed file when that conversion happens does
|
||||
not use the SHA1 key from git, so its content can be retrieved. I'm not
|
||||
quite sure how that problem was avoided in this case but something avoided
|
||||
the worst behavior.
|
||||
|
||||
It would be possible to special case the handling of SHA1 keys without a
|
||||
size to make importing from an old export not do the conversion. But that
|
||||
risks breakage for some user who is generating their own SHA1 keys and not
|
||||
including a size in them. Or for some external special remote that supports
|
||||
IMPORTKEY and generates SHA1 keys without a size. It seems better to avoid
|
||||
that potential breakage of unrelated things, and keep the upgrade process
|
||||
somewhat complicated when non-annexed files were exported before, than it
|
||||
does to streamline the upgrade.
|
||||
"""]]
|
|
@ -1,30 +0,0 @@
|
|||
When a FAT filesystem is unmounted and remounted, the inode numbers all
|
||||
change. This makes import tree from a directory special remote on FAT
|
||||
think the files have changed, and so it re-imports them. Since the content
|
||||
is the unchanged, the unnecessary work that is done is limited to hashing
|
||||
the file on the FAT filesystem. But that can be a lot of work when the tree
|
||||
being imported has a lot of large files in it.
|
||||
|
||||
This makes import tree potentially much slower than the legacy import
|
||||
interface (although that interface also re-hashes when used with
|
||||
--duplicate/--skip-duplicates).
|
||||
|
||||
Also, the content identifier log gets another entry, with a content
|
||||
identifier with the new inode number. So over time this can bloat the log.
|
||||
|
||||
May be better to omit the inode number from the content
|
||||
identifier for such a filesystem, instead relying on size and mtime?
|
||||
Although that would risk missing swaps of files with the same size and
|
||||
mtime, that seems like an unlikely thing, and in any case git-annex would
|
||||
import the data, and only miss the renaming of the files. It would also
|
||||
miss modifications that don't change size and preserve the mtime; such
|
||||
modifications are theoretically possible, but unlikely.
|
||||
|
||||
But how to detect when it's a FAT filesystem with this problem?
|
||||
The method git-annex uses when running on a FAT filesystem, of maintaining
|
||||
an inode sentinal file and checking it to tell when inodes have changed
|
||||
would need importing to write to the drive. That seems strange, and the
|
||||
drive could even be read-only. May be the directory special remote should
|
||||
just not use inode numbers at all?
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,12 +0,0 @@
|
|||
[glacier-cli](https://github.com/basak/glacier-cli) calls its own command `glacier` rather than `glacier-cli` or something else. This conflicts with [boto](https://github.com/boto/boto/)'s own `glacier` executable, as noted here:
|
||||
|
||||
* <https://github.com/basak/glacier-cli/issues/30>
|
||||
* <https://github.com/basak/glacier-cli/issues/47>
|
||||
|
||||
Whilst the `glacier-cli` project should resolve this conflict, it would be good if git-annex could be made to use a configurable path for this executable, rather than just assuming that it has been installed as `glacier`. After all, its installation procedure is simply telling the user to run `ln -s`, so there's no reason why the user couldn't make the target of this command `~/bin/glacier-cli` rather than `~/bin/glacier` - it's really irrelevant what the source file inside the git repo is called.
|
||||
|
||||
Of course, [`checkSaneGlacierCommand`](https://github.com/joeyh/git-annex/blob/master/Remote/Glacier.hs#L307) is still very much worth having, for safety.
|
||||
|
||||
> Well, it never got renamed, and checkSaneGlacierCommand does check for
|
||||
> the conflict, so I don't see any point in making the name configurable.
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,7 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="basak"
|
||||
subject="comment 1"
|
||||
date="2015-04-24T15:48:48Z"
|
||||
content="""
|
||||
Well, it's supposed to be a command line command, and I don't type `cd-cli` and `ls-cli`. So while `glacier-cli` might be fine as a project name and is fine for a name for integration, I don't think it makes sense to call it that in `/usr/bin/`, which is why I didn't. I'd prefer to have seen that boto integrate an improved `glacier` command, or for packaging to provide this one as an alternative (like `mawk` vs. `gawk` as `/usr/bin/awk`). But upstream boto considers themselves deprecated, so that's not going to happen. One of these days I'll package glacier-cli up for Debian, at which point I'll see if the boto maintainer is interested in doing something, since I don't actually believe anybody uses boto's glacier command (since it's mostly useless).
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://adamspiers.wordpress.com/"
|
||||
nickname="adamspiers"
|
||||
subject="Good point"
|
||||
date="2015-04-24T15:55:29Z"
|
||||
content="""
|
||||
glacier-cli would be a rather silly name to put in `/usr/bin`. How about `glcr`, as suggested [here](https://github.com/basak/glacier-cli/issues/30#issuecomment-95972840)?
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2015-04-24T17:23:10Z"
|
||||
content="""
|
||||
I don't want to complicate git-annex more with configurable names for
|
||||
programs, and glacier is not at all special in this regard, any program
|
||||
could be installed under any namee. We pick non-conflicting names to
|
||||
avoid integration nightmares. Pick a name and I'll use it.
|
||||
"""]]
|
|
@ -1,26 +0,0 @@
|
|||
Like was recently done for preferred content, when checking numcopies for a
|
||||
drop, it could check if other files are using the same key, and if so check
|
||||
that their numcopies (and mincopies) is satisfied as well.
|
||||
|
||||
There would be an efficiency tradeoff of course, since it would have to
|
||||
query the keys db. The question I suppose is, if someone sets different
|
||||
numcopies for different files via .gitattributes, and they use the same
|
||||
key, will the user think it's a problem that numcopies can be violated in
|
||||
some circumstances. And I think that users would maybe consider that to be
|
||||
a problem, if they happened to encounter the behavior.
|
||||
|
||||
It may also be worth considering making --all (etc) also check numcopies of
|
||||
associated files. Although then, in a bare repo, it would behave
|
||||
differently than in a non-bare repo. (Also if this is done, the preferred
|
||||
content checking should also behave the same way.) The docs for --all
|
||||
do say that it bypasses checking .gitattributes numcopies.
|
||||
--[[Joey]]
|
||||
|
||||
> Note that the assistant and git-annex sync already check numcopies
|
||||
> for all known associated files, so already handled this for unlocked
|
||||
> files. With the recent change to also track
|
||||
> associated files for locked files, they also handle it for those.
|
||||
>
|
||||
> But, git-annex drop/move/mirror don't yet.
|
||||
>
|
||||
> > [[fixed|done]] (did not change --all behavior) --[[Joey]]
|
|
@ -1,3 +0,0 @@
|
|||
Right now, non-annexed files get passed through the `annex` clean/smudge filter (see [[forum/Adding_files_to_git__58___Very_long___34__recording_state_in_git__34___phase]]). It would be better if `git-annex` configure the filter only for the annexed unlocked files, in the `.gitattributes` file at the root of the repository.
|
||||
|
||||
> not a viable solution, [[done]] --[[Joey]]
|
|
@ -1,19 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-11-22T16:01:26Z"
|
||||
content="""
|
||||
It immediately occurs to me that the proposal would break this:
|
||||
|
||||
git annex add foo
|
||||
git annex add bar
|
||||
git annex unlock bar
|
||||
git mv bar foo
|
||||
git commit -m add
|
||||
|
||||
Since foo was a locked file, gitattributes would prevent from being
|
||||
smudged, so the large content that was in bar gets committed directly to git.
|
||||
|
||||
The right solution is to improve the smudge/clean filter interface to it's
|
||||
not so slow, which there is copious discussion of elsewhere.
|
||||
"""]]
|
|
@ -1,46 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="moving unlocked file onto locked file isn't possible"
|
||||
date="2019-11-24T16:36:24Z"
|
||||
content="""
|
||||
`git mv` won't move an unlocked file onto a locked file (trace below).
|
||||
|
||||
\"The right solution is to improve the smudge/clean filter interface\" -- of course, but realistically, do you think git devs can be persuaded to do [[this|todo/git_smudge_clean_interface_suboptiomal]] sometime soon? Even if yes, it still seems better to avoid adding a step to common git workflows, than to make the step fast.
|
||||
|
||||
|
||||
[[!format sh \"\"\"
|
||||
(master_env_v164_py36) 11:14 [t1] $ ls
|
||||
bar foo
|
||||
(master_env_v164_py36) 11:14 [t1] $ git init
|
||||
Initialized empty Git repository in /tmp/t1/.git/
|
||||
(master_env_v164_py36) 11:14 [t1] $ git annex init
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
(master_env_v164_py36) 11:14 [t1] $ git annex add foo
|
||||
add foo ok
|
||||
(recording state in git...)
|
||||
(master_env_v164_py36) 11:14 [t1] $ git annex add bar
|
||||
add bar ok
|
||||
(recording state in git...)
|
||||
(master_env_v164_py36) 11:14 [t1] $ ls -alt
|
||||
total 0
|
||||
drwxrwxr-x 8 ilya ilya 141 Nov 24 11:14 .git
|
||||
drwxrwxr-x 3 ilya ilya 40 Nov 24 11:14 .
|
||||
lrwxrwxrwx 1 ilya ilya 108 Nov 24 11:14 bar -> .git/annex/objects/jx/MV/MD5E-s4--c157a79031e1c40f85931829bc5fc552/MD5E-s4--c157a79031\
|
||||
e1c40f85931829bc5fc552
|
||||
lrwxrwxrwx 1 ilya ilya 108 Nov 24 11:14 foo -> .git/annex/objects/00/zZ/MD5E-s4--d3b07384d113edec49eaa6238ad5ff00/MD5E-s4--d3b07384d1\
|
||||
13edec49eaa6238ad5ff00
|
||||
drwxrwxrwt 12 root root 282 Nov 24 11:14 ..
|
||||
(master_env_v164_py36) 11:14 [t1] $ git annex unlock bar
|
||||
unlock bar ok
|
||||
(recording state in git...)
|
||||
(master_env_v164_py36) 11:16 [t1] $ git mv bar foo
|
||||
fatal: destination exists, source=bar, destination=foo
|
||||
(master_env_v164_py36) 11:17 [t1] $
|
||||
|
||||
|
||||
|
||||
\"\"\"]]
|
||||
"""]]
|
|
@ -1,126 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="even git mv -f seems to work correctly"
|
||||
date="2019-11-24T17:25:32Z"
|
||||
content="""
|
||||
Also, `git mv` seems to reuse the already-smudged object contents of the source file for the target file, so even with `git mv -f` only the checksum gets checked into git:
|
||||
|
||||
[[!format sh \"\"\"
|
||||
+ cat ./test-git-mv
|
||||
#!/bin/bash
|
||||
|
||||
set -eu -o pipefail -x
|
||||
|
||||
cat $0
|
||||
|
||||
TEST_DIR=/tmp/test_dir
|
||||
mkdir -p $TEST_DIR
|
||||
chmod -R u+w $TEST_DIR
|
||||
rm -rf $TEST_DIR
|
||||
mkdir -p $TEST_DIR
|
||||
pushd $TEST_DIR
|
||||
|
||||
git init
|
||||
git annex init
|
||||
|
||||
git --version
|
||||
git annex version
|
||||
|
||||
rm .git/info/attributes
|
||||
echo foo > foo
|
||||
echo bar > bar
|
||||
git annex add foo bar
|
||||
git check-attr -a foo
|
||||
git check-attr -a bar
|
||||
echo 'bar filter=annex' > .gitattributes
|
||||
git add .gitattributes
|
||||
git check-attr -a foo
|
||||
git check-attr -a bar
|
||||
|
||||
git annex unlock bar
|
||||
git mv bar foo || true
|
||||
git mv -f bar foo
|
||||
git commit -m add
|
||||
git log -p
|
||||
|
||||
|
||||
+ TEST_DIR=/tmp/test_dir
|
||||
+ mkdir -p /tmp/test_dir
|
||||
+ chmod -R u+w /tmp/test_dir
|
||||
+ rm -rf /tmp/test_dir
|
||||
+ mkdir -p /tmp/test_dir
|
||||
+ pushd /tmp/test_dir
|
||||
/tmp/test_dir /tmp
|
||||
+ git init
|
||||
Initialized empty Git repository in /tmp/test_dir/.git/
|
||||
+ git annex init
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
+ git --version
|
||||
git version 2.20.1
|
||||
+ git annex version
|
||||
git-annex version: 7.20191024-g6dc2272
|
||||
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
|
||||
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
|
||||
operating system: linux x86_64
|
||||
supported repository versions: 7
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5 6
|
||||
local repository version: 7
|
||||
+ rm .git/info/attributes
|
||||
+ echo foo
|
||||
+ echo bar
|
||||
+ git annex add foo bar
|
||||
add foo ok
|
||||
add bar ok
|
||||
(recording state in git...)
|
||||
+ git check-attr -a foo
|
||||
+ git check-attr -a bar
|
||||
+ echo 'bar filter=annex'
|
||||
+ git add .gitattributes
|
||||
+ git check-attr -a foo
|
||||
+ git check-attr -a bar
|
||||
bar: filter: annex
|
||||
+ git annex unlock bar
|
||||
unlock bar ok
|
||||
(recording state in git...)
|
||||
+ git mv bar foo
|
||||
fatal: destination exists, source=bar, destination=foo
|
||||
+ true
|
||||
+ git mv -f bar foo
|
||||
+ git commit -m add
|
||||
[master (root-commit) 8610c0d] add
|
||||
2 files changed, 2 insertions(+)
|
||||
create mode 100644 .gitattributes
|
||||
create mode 100644 foo
|
||||
+ git log -p
|
||||
commit 8610c0d8f327140608e71dc229f167731552d284
|
||||
Author: Ilya Shlyakhter <ilya_shl@alum.mit.edu>
|
||||
Date: Sun Nov 24 12:24:28 2019 -0500
|
||||
|
||||
add
|
||||
|
||||
diff --git a/.gitattributes b/.gitattributes
|
||||
new file mode 100644
|
||||
index 0000000..649f07e
|
||||
--- /dev/null
|
||||
+++ b/.gitattributes
|
||||
@@ -0,0 +1 @@
|
||||
+bar filter=annex
|
||||
diff --git a/foo b/foo
|
||||
new file mode 100644
|
||||
index 0000000..266ae50
|
||||
--- /dev/null
|
||||
+++ b/foo
|
||||
@@ -0,0 +1 @@
|
||||
+/annex/objects/MD5E-s4--c157a79031e1c40f85931829bc5fc552
|
||||
|
||||
\"\"\"]]
|
||||
|
||||
|
||||
|
||||
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="installing clean/smudge filter lazily"
|
||||
date="2021-03-19T02:30:13Z"
|
||||
content="""
|
||||
\"the proposal would break this\" -- suppose [[`git-annex-unlock`|git-annex-unlock]] was changed to install the clean/smudge filter for `*` if not installed yet?
|
||||
|
||||
Related: [Avoid lengthy \"Scanning for unlocked files ...\"](https://git-annex.branchable.com/todo/Avoid_lengthy___34__Scanning_for_unlocked_files_...__34__/)
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2021-03-23T16:02:47Z"
|
||||
content="""
|
||||
> "the proposal would break this" -- suppose git-annex-unlock was changed to install the clean/smudge filter for * if not installed yet?
|
||||
|
||||
git-annex unlock is not the only way unlocked files can appear in your
|
||||
tree. Consider git pull.
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
Some things to do with the [[design/P2P_protocol]]
|
||||
are works in progress, needing a future flag day to complete.
|
||||
|
||||
## VERSION over tor
|
||||
|
||||
Old versions of git-annex, before 6.20180312, which speak the P2P protocol
|
||||
over tor, don't support VERSION, and attempting to negotiate a version
|
||||
will cause the server to hang up the connection. To deal with this
|
||||
historical bug, the version is not currently negotiated when using the
|
||||
protocol over tor. At some point in the future, when all peers can be
|
||||
assumed to be upgraded, this should be changed.
|
||||
|
||||
> [[done]] --[[Joey]]
|
||||
|
||||
## git-annex-shell fallbacks
|
||||
|
||||
When using git-annex-shell p2pio, git-annex assumes that if it exits 1,
|
||||
it does not support that, and falls back to the old sendkey/rerecvkey,
|
||||
etc.
|
||||
|
||||
At some point in the future, once all git-annex and git-annex-shell
|
||||
can be assumed to be upgraded to 6.20180312, this fallback can be removed.
|
||||
It will allows removing a lot of code from git-annex-shell and a lot of
|
||||
fallback code from Remote.Git.
|
||||
|
||||
> [[done]] --[[Joey]]
|
||||
|
||||
[[!tag confirmed]]
|
|
@ -1,12 +0,0 @@
|
|||
git-annex has good support for running commands in parallel, but there
|
||||
are still some things that could be improved, tracked here:
|
||||
|
||||
* Maybe support -Jn in more commands. Just needs changing a few lines of code
|
||||
and testing each.
|
||||
|
||||
* Maybe extend --jobs/annex.jobs for more control. `--jobs=cpus` is already
|
||||
supported; it might be good to have `--jobs=cpus-1` to leave a spare
|
||||
cpu to avoid contention, or `--jobs=remotes*2` to run 2 jobs per remote.
|
||||
|
||||
> Ok, those are maybe good ideas, but this needs to be closed at some
|
||||
> point, so [[done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="parallelization"
|
||||
date="2019-11-27T17:23:14Z"
|
||||
content="""
|
||||
When operating on many files, maybe run N parallel commands where i'th command ignores paths for which `(hash(filename) module N) != i`. Or, if git index has size I, i'th command ignores paths that are not legixographically between `index[(I/N)*i]` and `index[(I/N)*(i+1)]` (for index state at command start). Extending [[git-annex-matching-options]] with `--block=i` would let this be done using `xargs`.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2020-01-30T19:24:47Z"
|
||||
content="""
|
||||
How would running parallel commands with xargs be better than the current
|
||||
-J?
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="running parallel commands with xargs"
|
||||
date="2020-02-20T20:48:33Z"
|
||||
content="""
|
||||
\"How would running parallel commands with xargs be better than the current -J\" -- it would allow writing wrappers that kill/retry stuck git-annex process trees, as suggested [[here|https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7]].
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
Hello,
|
||||
|
||||
By means of bisection I have determined that commit 4bf7940d6b912fbf692b268f621ebd41ed871125, recently uploaded to Debian after the bullseye freeze, is responsible for breaking the annex-to-annex-reinject script which ships with Git::Annex. Here is a minimal reproducer of the problem:
|
||||
|
||||
spwhitton@melete:~/tmp>echo foo >bar
|
||||
spwhitton@melete:~/tmp>mkdir annex
|
||||
spwhitton@melete:~/tmp>cp bar annex
|
||||
spwhitton@melete:~/tmp>cd annex
|
||||
spwhitton@melete:~/tmp/annex>git init
|
||||
spwhitton@melete:~/tmp/annex>git annex add bar
|
||||
spwhitton@melete:~/tmp/annex>git annex drop --force bar
|
||||
spwhitton@melete:~/tmp/annex>git annex reinject --known /home/spwhitton/tmp/bar
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
fatal: './../bar' is outside repository at '/home/spwhitton/tmp/annex'
|
||||
git-annex: fd:15: Data.ByteString.hGetLine: end of file
|
||||
|
||||
--spwhitton
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-10-01T16:42:36Z"
|
||||
content="""
|
||||
Also happens with a relative path to the file. And also
|
||||
`git annex reinject ../bar bar` fails the same way.
|
||||
|
||||
Fixed. In case you want to cherry-pick the fix, it's the commit adding
|
||||
this comment, as well as the 2 prior commits fixing bugs in dirContains.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="spwhitton"
|
||||
avatar="http://cdn.libravatar.org/avatar/9c3f08f80e67733fd506c353239569eb"
|
||||
subject="comment 2"
|
||||
date="2021-10-02T17:04:02Z"
|
||||
content="""
|
||||
Thanks so much for the fix! It looks like cherry-picking breaks the test suite, so I'll probably just wait for the next release.
|
||||
"""]]
|
|
@ -1,3 +0,0 @@
|
|||
Small files might also be used for performance reasons, so there should be an option to also automatically fix merge conflicts for small files in git-annex-sync.
|
||||
|
||||
> [[wontfix|done]] --[[Joey]]
|
|
@ -1,27 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-06-23T16:43:34Z"
|
||||
content="""
|
||||
There's a reason people don't want git to automatically resolve merge
|
||||
conflicts of code, and for all git-annex knows small files are code.
|
||||
|
||||
Or looking at it from the other perspective, non-technical git-annex
|
||||
assistant users need an automatic merge conflict resolution of annexed
|
||||
files, since the assistant commits changes to those files and otherwise
|
||||
they could end up with a conflict they don't understand how to resolve.
|
||||
|
||||
And, git-annex sync inherited that from the assistant. Which may or may not
|
||||
have been the best decision. One thing in favor of it being a reasonable
|
||||
decision is that a conflict in an annexed file will mostly be resolved by
|
||||
picking one version of the file or the other, unlike conflicts in source
|
||||
code which are often resolved by using brain sweat. Large and often binary
|
||||
files not being very possible for human brains to deal with directly. Or
|
||||
perhaps by a tool that combines the two versions in some way, in which case
|
||||
the conflict resolution leaves both versions easily accessible for such a
|
||||
tool.
|
||||
|
||||
So git-annex does know, or can make some reasonable assumptions, about
|
||||
annexed files, but generalizing those assumptions to small files would not
|
||||
make sense.
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 2"
|
||||
date="2021-06-24T17:43:36Z"
|
||||
content="""
|
||||
The idea is to solve the conflicts in a similar way to conflicts in annexed files. I.e. by creating two files file.version-a and file.version-b.
|
||||
|
||||
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="resolving merge conflicts"
|
||||
date="2021-06-24T18:03:40Z"
|
||||
content="""
|
||||
\"Small files\" here means \"non-annexed files\", right?
|
||||
|
||||
Whether a file is annexed, and whether its merge conflicts should be auto-resolved by creating two files `file.version-a` and `file.version-b`, seem like orthogonal things.
|
||||
One might check small binary files directly into git, and one might annex source code files e.g. just for the simplicity of annexing everything (as [[DataLad|projects/datalad]] does or at least used to).
|
||||
|
||||
So, maybe, `.gitattributes` should control which files' merge conflicts get auto-resolved?
|
||||
|
||||
|
||||
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-06-25T16:12:02Z"
|
||||
content="""
|
||||
I have no interest in continung a feature added for the assistant down a
|
||||
road to making source code files that are checked into git be handled in
|
||||
some other way when merging.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
It'd be very useful if you could specify a size limit for drop/move/copy/get-type operations. `git annex move --to other --limit 1G` would move at most 1G of data to the other repo for example.
|
||||
|
||||
This way you could quickly "garbage collect" a few dozen GiB from your annex repo when you're running out of space without dropping everything for example.
|
||||
|
||||
Another issue this could be used to mitigates is that, for some reason, git-annex doesn't auto-stop the transfer when the repos on my external drives are full properly.
|
||||
|
||||
I imagine there are many more use-cases where quickly being able to set a limit for the amount of data a command should act on could come in handy.
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,34 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-06-04T18:07:44Z"
|
||||
content="""
|
||||
I agree this could be useful.
|
||||
|
||||
Implementation is complicated by it needing to only count the size when a
|
||||
file is acted on. Eg `git annex get` shouldn't stop when it's seen enough
|
||||
files that already have content present.
|
||||
|
||||
So it seems it would need to be implemented next to where showStartMessage
|
||||
is used in commandAction, looking at the size of the key in the
|
||||
StartMessage (or possibly file when there's no key?) and when it would go
|
||||
over the limit, rather than proceeding to perform the action it could skip
|
||||
doing anything and go on to the next file.
|
||||
|
||||
I don't think there is a good way to make it immediately exit
|
||||
when it reaches the limit, so if there were subsequent smaller files
|
||||
after a skipped file that could be processed still, it still would.
|
||||
|
||||
It would probably also make sense to make it later exit with 101 like
|
||||
--time-limit does, or another special exit code, to indicate it didn't
|
||||
process everything.
|
||||
|
||||
Hmm, if an action fails, should the size of the file be counted or not?
|
||||
If failures are not counted, incomplete transfers could result in a
|
||||
lot more work/disk space than desired. But if failures are counted
|
||||
after failing to drop a bunch of files, or failing early on to get a bunch
|
||||
of files, it could stop seemingly prematurely. Also there's a problem with
|
||||
concurrency, if it needs to know the result of running jobs before deciding
|
||||
whether to start a new job. Seems no entirely good answer here, but the
|
||||
concurrency problem seems only solvable by updating the count at start time.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-06-04T20:35:26Z"
|
||||
content="""
|
||||
--size-limit is implemented, for most git-annex commands.
|
||||
|
||||
Ones like `git-annex add` that don't operate on annexed files don't support
|
||||
it, at least yet.
|
||||
|
||||
Ones like git-annex export/import/sync I'm not sure it makes sense to
|
||||
support it, since they kind of operate at a higher level than individual
|
||||
files.
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
When adding a lot of small files to git with `git annex add`,
|
||||
it is slow because git runs the smudge filter on all files
|
||||
and [[that_is_slow|todo/git_smudge_clean_interface_suboptiomal]].
|
||||
|
||||
But `git-annex add --force-small` is much much faster, because that
|
||||
bypasses git add entirely, hashing the content and staging it in the index
|
||||
from git-annex. So could that same method be used to speed up the slow case?
|
||||
|
||||
My concern with doing this is that there may be things that `git add`
|
||||
does that are not done when bypassing it. The only one I can think of is,
|
||||
if the user has other smudge/clean filters than the git-annex one
|
||||
installed, they would not be run either. It could be argued that's a bug
|
||||
with the existing `--force-small` too, but at least that's not the default.
|
||||
|
||||
Possible alternate approach: Unsetting filter.annex.smudge and
|
||||
filter.annex.clean when running `git add`?
|
||||
|
||||
> This approach is a winner! [[done]] --[[Joey]]
|
|
@ -1,19 +0,0 @@
|
|||
reconcileStaged should be able to be sped up by improving streaming through
|
||||
git, similar to [[!commit 0f54e5e0ae73b89bb6743bf298915619da00c3f4]].
|
||||
|
||||
Normally it's plenty fast enough, but users who often switch between
|
||||
branches that have tens to hundreds of thousands of diverged files will
|
||||
find it slow, and this should speed it up by somewhere around 3x (excluding
|
||||
sqlite writes). --[[Joey]]
|
||||
|
||||
> Implemented this. Benchmarked it in a situation where 100,000 annexed
|
||||
> files were added to the index (by checking out a branch with more annexed
|
||||
> files). old: 50 seconds; new: 41 seconds
|
||||
|
||||
> Also benchmarked when 100,000 annexed files were removed from the index.
|
||||
> old: 26 seconds; new: 17 seconds.
|
||||
>
|
||||
> Adding associated files to the sqlite db is clearly more expensive than
|
||||
> removing from it.
|
||||
>
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="keys db optimization"
|
||||
date="2021-06-02T16:53:02Z"
|
||||
content="""
|
||||
\"users who often switch between branches that have tens to hundreds of thousands of diverged files will find it slow\" -- that's my use case ;) Could one keys-to-files db be kept per branch?
|
||||
|
||||
Maybe, the keys db could be split, based e.g. on prefix of md5 of the key, into separate sqlite files, and the writing to them parallelized?
|
||||
It's common to be working on a many-core machine.
|
||||
|
||||
Is the keys-to-locked-files db used for anything besides detecting keys used by more than one files? For that one purpose there might be faster solutions.
|
||||
But, if it's implemented, maybe it also be used to remove the [[limitation|git-annex-preferred-content]] that \"when a command is run with the --all option, or in a bare repository, there is no filename associated with an annexed object, and so \"include=\" and \"exclude=\" will not match\"?
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="matching include/exclude based on file extension in the key"
|
||||
date="2021-06-02T17:02:58Z"
|
||||
content="""
|
||||
Actually, the include/exclude limitation above could be removed by just looking at the keys themselves, if the include/exclude expression is of the form `*.ext` and the keys include file extensions.
|
||||
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-06-04T17:45:21Z"
|
||||
content="""
|
||||
It is not very useful to detect if a key is used by more than one file if
|
||||
you don't know the files. In any case, yes, the keys db is used for a large
|
||||
number of things, when it comes to unlocked files.
|
||||
|
||||
[[todo/numcopies_check_other_files_using_same_key]] has some thoughts on
|
||||
--all, but I doubt it will make sense to change --all.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-06-04T17:49:43Z"
|
||||
content="""
|
||||
Keys with extensions do not necessarily have the same extension as used in
|
||||
the worktree files that include/exclude match on.
|
||||
|
||||
I'm not sure why all these wild ideas are being thrown out there when this
|
||||
todo is about a specific, simple improvement that will speed up the git
|
||||
part of the scanning by about 3x? It's like you somehow consider this an
|
||||
emergency where increasingly wild measures have to be taken to prevent me
|
||||
from making a terrible mistake?
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject=""why all these wild ideas are being thrown out there""
|
||||
date="2021-06-04T22:15:32Z"
|
||||
content="""
|
||||
It just seemed like all the speedup possibilities from `annex.supportunlocked=false` are getting undone to optimize a not-too-common scenario?
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2021-06-07T15:48:07Z"
|
||||
content="""
|
||||
annex.supportunlocked=false still prevents the smudge/clean filter from
|
||||
being used, which can significantly speed up git if the repository has a
|
||||
lot of files stored in git.
|
||||
"""]]
|
|
@ -1,3 +0,0 @@
|
|||
For commands like [[`git-annex-whereis`|git-annex-whereis]] that take a `path` argument, it would help if this could be generalized to taking a [tree-ish](https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddeftree-ishatree-ishalsotreeish). E.g. for `git-annex-whereis` this could be used to look up where previous file versions are stored.
|
||||
|
||||
> [[done]] before this was filed
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-05-12T16:23:02Z"
|
||||
content="""
|
||||
This is already supported by whereis (and quite a lot of other commands)
|
||||
with the --branch option, which is documented to support a treeish.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
Based on an irc conversation earlier today:
|
||||
|
||||
19:50 < warp> joeyh: what is the best way to figure out the (remote) filename for a file stored in an rsync remote?
|
||||
|
||||
20:43 < joeyh> warp: re your other question, probably the best thing would be to make the whereis command print out locations for each remote, as it always does for the web special remotes
|
||||
|
||||
> Several remotes do now populate whereis with urls, but an rsync remote
|
||||
> does not in general have http urls to content in it. So I don't think
|
||||
> it makes sense to do anything for rsync remotes. [[closeing|done]] --[[Joey]]
|
Loading…
Add table
Add a link
Reference in a new issue