Commit graph

44025 commits

Author SHA1 Message Date
Joey Hess
eb42935e58
Windows: Fix CRLF handling in some log files
In particular, the mergedrefs file was written with CR added to each line,
but read without CRLF handling. This resulted in each update of the file
adding CR to each line in it, growing the number of lines, while also
preventing the optimisation from working, so it remerged unncessarily.

writeFile and readFile do NewlineMode translation on Windows. But the
ByteString conversion prevented that from happening any longer.

I've audited for other cases of this, and found three more
(.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All
of those also only prevent optimisations from working. Some other files are
currently both read and written with ByteString, but old git-annex may have
written them with NewlineMode translation. Other files are at risk for
breakage later if the reader gets converted to ByteString.

This is a minimal fix, but should be enough, as long as I remember to use
fileLines when splitting a ByteString into lines. This leaves files written
using ByteString without CR added, but that's ok because old git-annex has
no difficulty reading such files.

When the mergedrefs file has gotten lines that end with "\r\r\r\n", this
will eventually clean it up. Each update will remove a single trailing CR.

Note that S8.lines is still used in eg Command.Unused, where it is parsing
git show-ref, and similar in Git/*. git commands don't include CR in their
output so that's ok.

Sponsored-by: Joshua Antonishen on Patreon
2023-10-30 14:23:23 -04:00
Joey Hess
ea2876ae77
add PackageImports
This makes loading it in ghci work when both crypton and cryptonite are
installed.
2023-10-30 14:10:46 -04:00
mih
2299ee2519 Report on a possibly unknown slowness of lookupkey 2023-10-30 07:45:05 +00:00
Joey Hess
bef1d68d13
Merge branch 'master' of ssh://git-annex.branchable.com 2023-10-27 17:47:50 -04:00
Joey Hess
a09d2954c5
remove wrongly nested comment
This happened to prevent checkout on windows, due to filename too long.
2023-10-27 17:47:23 -04:00
jkniiv
d0b381bd42 Added a comment 2023-10-27 18:45:03 +00:00
Joey Hess
73c02e4fe3
comment 2023-10-26 15:22:22 -04:00
Joey Hess
3330d80ae2
Merge branch 'master' of ssh://git-annex.branchable.com 2023-10-26 14:23:08 -04:00
Joey Hess
8f325d67ae
display summary after concurrent output is complete
Otherwise, it could have buffered output that was displayed after the
summary. I started seeing that on my 12 core laptop, probably due to the
increased amount of concurrency.

Also seems possible, that, since the summary did exit, it might have
prevented displaying buffered output sometimes, or had other bad
results.
2023-10-26 14:19:41 -04:00
Joey Hess
7776d03355
avoid unused import 2023-10-26 14:00:21 -04:00
Joey Hess
0f3b78ec29
simplify 2023-10-26 14:00:02 -04:00
Joey Hess
e4ef688b98
RawFilePath conversion
May slightly speed up startup.
2023-10-26 13:58:49 -04:00
nobodyinperson
af6ecc9be5 Added a comment 2023-10-26 17:46:28 +00:00
Joey Hess
b0302c937d
avoid double RawFilePath conversion
Minor optimisation.
2023-10-26 13:41:17 -04:00
Joey Hess
d9fd205cbb
push RawFilePath down into Annex.ReplaceFile
Minor optimisation, but a win in every case, except for a couple where
it's a wash.

Note that replaceFile still takes a FilePath, because it needs to
operate on Chars to truncate unicode filenames properly.
2023-10-26 13:36:49 -04:00
Joey Hess
c873586e14
eliminate s2w8 and w82s
Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode
characters. Luckily, genUUIDInNameSpace is only ever used on ASCII
strings as far as I can determine. In particular, git-remote-gcrypt's
gcrypt-id is an ASCII string.
2023-10-26 13:12:57 -04:00
Joey Hess
3742263c99
simplify base64 to only use ByteString
Note the use of fromString and toString from Data.ByteString.UTF8 dated
back to commit 9b93278e8a. Back then it
was using the dataenc package for base64, which operated on Word8 and
String. But with the switch to sandi, it uses ByteString, and indeed
fromB64' and toB64' were already using ByteString without that
complication. So I think there is no risk of such an encoding related
breakage.

I also tested the case that 9b93278e8a
fixed:

	git-annex metadata -s foo='a …' x
	git-annex metadata x
	metadata x
	  foo=a …

In Remote.Helper.Encryptable, it was avoiding using Utility.Base64
because of that UTF8 conversion. Since that's no longer done, it can
just use it now.
2023-10-26 13:10:05 -04:00
beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec
a913e3587b 2023-10-26 04:19:36 +00:00
Joey Hess
985dd38847
add 2023-10-25 14:44:57 -04:00
Joey Hess
626622da1b
comment 2023-10-25 14:07:16 -04:00
Joey Hess
97403a4b4b
comment 2023-10-25 13:30:19 -04:00
Joey Hess
9a1e8fbabc
Merge branch 'master' of ssh://git-annex.branchable.com 2023-10-25 13:21:12 -04:00
Joey Hess
3a3a5a2fd3
remove an out of date comment
The mentioned todo was closed.
2023-10-25 13:08:01 -04:00
Joey Hess
c9866d2164
speed up populating the importfeed database
Avoid conversion from ByteString to String for urls that will just be
converted right back to ByteString to go into the database.

Also setTempUrl is not used by importfeed, so avoid checking for temp
urls in this code path.

This benchmarks as only a small improvement. From 2.99s to 2.78s
when populating a database with 33k urls.

Note that it does not seem worth replacing URLString with URLByteString
generally, because the ways urls are used all entails either parseURI,
which takes a string, or passing a parameter to eg curl, which also is
currently a string.

Sponsored-by: Leon Schuermann on Patreon
2023-10-25 13:00:17 -04:00
nobodyinperson
1d1864ee5e Brainstorm (semi)automatic description updating 2023-10-25 11:27:17 +00:00
Joey Hess
aaeadc422a
comment 2023-10-24 13:54:31 -04:00
Joey Hess
0da1d40cd4
Improve memory use of --all when using annex.private
This does not improve Annex.Branch.files at all, since it still uses ++ to
combine the lists, so forcing all but the last one.

But when there are a lot of files in the private journal, it does avoid
--all (or a bare repo) from buffering the filenames in memory.

See commit 653b719472 for prior discussion of
this buffering.

Sponsored-by: Graham Spencer on Patreon
2023-10-24 13:20:55 -04:00
v@80d733f66c2e675c27ee39df565ebed8d0ea12de
18f902efa9 Added a comment: Minimum required version 2023-10-24 14:27:25 +00:00
v@80d733f66c2e675c27ee39df565ebed8d0ea12de
57fdb44817 Added a comment: Outdated? 2023-10-24 13:26:44 +00:00
Joey Hess
8bde6101e3
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
2023-10-23 16:46:22 -04:00
Joey Hess
df4a60e28d
update 2023-10-23 14:18:49 -04:00
Joey Hess
105fb7e47d
update 2023-10-23 14:12:18 -04:00
Joey Hess
ae4ed08fc6
Merge branch 'master' of ssh://git-annex.branchable.com 2023-10-23 13:56:22 -04:00
Joey Hess
12e751bbb6
improve docs 2023-10-23 13:55:53 -04:00
Manager2103
847e9b4a3c Added a comment 2023-10-23 17:50:22 +00:00
Joey Hess
70d11a209d
notabug 2023-10-23 13:46:46 -04:00
matrss
efad352738 2023-10-23 15:16:18 +00:00
branch
079323e846 Added a comment 2023-10-23 09:57:00 +00:00
Manager2103
fe3b85a744 Added a comment 2023-10-21 17:22:53 +00:00
Joey Hess
5d807fcac0
notabug 2023-10-21 12:27:24 -04:00
Joey Hess
434f70f2ce
Merge branch 'master' of ssh://git-annex.branchable.com 2023-10-21 12:25:19 -04:00
Joey Hess
22e3cf03c8
comment 2023-10-20 14:02:37 -04:00
Joey Hess
6a61c7ff45
Fix crash of enableremote when the special remote has embedcreds=yes
The crash occurred because writeCreds got called twice, and writeFileProtected
neglected to close its file handle, so the file was open for write when
written the second time.

It seems unncessary and suboptimal that writeCreds gets called twice.
One call is from getRemoteCredPair and the other from setRemoteCredPair'.
What happens is that in the enableremote case, code that also runs at
initremote does unncessary work. Might be possible to improve that, but
I've gone for the simple fix.

Sponsored-by: k0ld on Patreon
2023-10-20 13:19:12 -04:00
Joey Hess
37c299ae6c
remove vim dev hack
This caused infinite recursion when I just used it, the dev target has
changed to re-run make. Anyway, not needed now.
2023-10-20 12:50:31 -04:00
Manager2103
fbb29e29de 2023-10-20 16:14:24 +00:00
Manager2103
04e76d138f 2023-10-20 16:13:30 +00:00
Manager2103
017dfd2385 2023-10-20 16:12:03 +00:00
Manager2103
7d992dff46 2023-10-20 16:10:54 +00:00
beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec
4e06d64158 Added a comment 2023-10-17 10:28:51 +00:00
beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec
7967de8fc0 Added a comment 2023-10-17 10:16:24 +00:00