Merge branch 'master' into sqlite

This commit is contained in:
Joey Hess 2019-12-19 16:26:23 -04:00
commit 02e00fd7ab
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
37 changed files with 314 additions and 128 deletions

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-12-19T15:29:40Z"
content="""
Retargeting this todo at something useful post-git-add-kerfluffle,
annex.addunlocked could usefully be a pagespec to allow adding some files
unlocked and others locked (by git-annex add only, not git add).
"true" would be the same as "anything" and false as "nothing".
---
It may also then make sense to let it be configured in .gitattributes.
Although, the ugliness of setting a pagespec in .gitattributes,
as was done for annex.largefiles, coupled with the overhead of needing to
query that from git-check-attr for every file, makes me wary.
(Surprising amount of `git-annex add` time is in querying the
annex.largefiles and annex.backend attributes. Setting the former in
gitconfig avoids the attribute query and speeds up add of smaller files by
2%. Granted I've sped up add (except hashing) by probably 20% this month,
and with large files the hashing dominates.)
The query overhead could maybe be finessed: Since adding a file
already queries gitattributes for two other things, a single query could be
done for a file and the result cached.
Letting it be globally configured via `git-annex config` is an alternative
that I'm leaning toward.
(That would also need some caching, easier to implement and faster
since it is not a per-file value as the gitattribute would be.)
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="named pipes as destination files"
date="2019-12-18T18:41:57Z"
content="""
\"getting object content from remotes involve a destination file that is written to\" -- what happens if git-annex makes a named pipe, and passes that as the destination file name to the remote?
"""]]

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-12-19T16:08:09Z"
content="""
Hmm, it used to be that `git add .` would smudge all dotfiles without that
line, but now annex.largefiles has to be configured for it to smudge
anything.
So, this could be dealt with in annex.largefiles. Both `anything` and
`include=*` currently match dotfiles. It's kind of weird really that `*`
matches dotfiles; it does not in the shell. If `*` did not match dotfiles
(and `anything` is just an alias for `include=*`), it would be fairly safe
to remove the `.* !filter` line by default. (If annex.largefiles has a
content-based setting, and a dotfile is large enough or the right mime type
or whatever, it's reasonable to default to smudging it.)
Then, you could set annex.largfiles to match the dotfiles you want,
eg `include=* or include=.mydotfile`. You could put the config in
.gitattributes if you want to configure it globally.
This change to annex.largefiles would also let `git-annex add`
stop skipping dotfiles by default; instead annex.largefiles would not match
dotfiles unless the user explicitly configured it to, and so the dotfiles
would be added as small files, directly to git.
I like this because it unifies the behaviors of the two ways of adding,
and it reduces the complexity, rather than adding more.
Removing the `.* !filter` line by default
would need to be done as part of the v8 upgrade, or a later upgrade.
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2019-12-19T17:17:31Z"
content="""
`*` is not only used in annex.largefiles, but other pagespecs too.
Like preferred content:
exclude=archive/*
So changing `*` to not match dotfiles would have wide reaching effects,
and it's really not good for different versions of git-annex to parse
preferred content expressions differently. And it seems too confusing to
have `*` match differently in annex.largefiles than in other pagespecs.
Having a single config that controls both kinds of adds still seems like a
good idea, but I don't know what that config should be.
annex.largedotfiles?
"""]]

View file

@ -0,0 +1,3 @@
Profiling of `git annex find --not --in web` suggests that converting Ref
to contain a ByteString, rather than a String, would eliminate a
fromRawFilePath that uses about 1% of runtime.

View file

@ -0,0 +1,21 @@
Often a command will need to read a number of files from the git-annex
branch, and it uses getJournalFile for each to check for any journalled
change that has not reached the branch. But typically, the journal is empty
and in such a case, that's a lot of time spent trying to open journal files
that DNE.
Profiling eg, `git annex find --in web` shows things called by getJournalFile
use around 5% of runtime.
What if, once at startup, it checked if the journal was entirely empty.
If so, it can remember that, and avoid reading journal files.
Perhaps paired with staging the journal if it's not empty.
This could lead to behavior changes in some cases where one command is
writing changes and another command used to read them from the journal and
may no longer do so. But any such behavior change is of a behavior that
used to involve a race; the reader could just as well be ahead of the
writer and it would have already behaved as it would after the change.
But: When a process writes to the journal, it will need to update its state
to remember it's no longer empty. --[[Joey]]

View file

@ -9,7 +9,7 @@ Benchmarking `git-annex find`, speedups range from 28-66%. The files fly by
much more snappily. Other commands likely also speed up, but do more work
than find so the improvement is not as large.
The `bs` branch is in a mergeable state now.
The `bs` branch is in a mergeable state now. [[done]]
Stuff not entirely finished: