Commit graph

1631 commits

Author SHA1 Message Date
Joey Hess
7298123520
build git trees using ContentIdentifier to speed up import
This gets the trees built, but it does not use them. Next step will be
to remember the tree for next time an import is done, and diff between
old and new trees to find the files that have changed.

Added --missing to the mktree parameters. That only disables a check, so
it's ok to do everywhere mktree is used. It probably also speeds up
mktree to disable the check.

Note that git fsck does not complain about the resulting tree objects
that point to shas that are not in the repository. Even with --strict.

A quick benchmark, importing 10000 files, this slowed it down
from 2:04.06 to 2:04.28. So it will more than pay for itself.

Sponsored-by: Luke Shumaker on Patreon
2023-05-31 12:46:54 -04:00
Joey Hess
0da0e2efcc
add git config debugging
(and process cwd debugging)

Sponsored-by: Dartmouth College's Datalad project
2023-05-15 15:35:29 -04:00
Joey Hess
9812d9aaec
support aeson for Map
Make unused --json use it, which is better than the doubly nested lists
it was using.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 13:51:37 -04:00
Joey Hess
57c1b4f5e5
initremote: Avoid creating a remote that is not encrypted when gpg is broken
checksize was applied lazily, so the exception didn't happen until the
remote was set up.

Sponsored-by: k0ld on Patreon
2023-05-01 13:00:05 -04:00
Joey Hess
aff37fc208
avoid annexFileMode special case
This makes annexFileMode be just an application of setAnnexPerm',
which avoids having 2 functions that do different versions of the same
thing.

Fixes some buggy behavior for some combinations of core.sharedRepository
and umask.

Sponsored-by: Jack Hill on Patreon
2023-04-27 15:58:37 -04:00
Joey Hess
2efceba789
fix windows build 2023-04-12 19:33:19 -04:00
Joey Hess
c5bcb55a8b
add ScopedTypeVariables 2023-04-12 19:19:22 -04:00
Joey Hess
6fc999193f
avoid displaying ExitCode exceptions
Don't need to be sanitized and displaying them messes up actually
exiting with the right exit code! And broke the test suite.

Sponsored-by: Brett Eisenberg on Patreon
2023-04-12 17:04:57 -04:00
Joey Hess
2fdb6ca879
remove unused imports 2023-04-12 16:48:18 -04:00
Joey Hess
a576fc3b12
fix mojibake reversion in display of utf8
When displaying a ByteString like "💕", safeOutput operates on
individual bytes like "\240\159\146\149" and isControl '\146' = True,
so it got truncated to just "\240".

So, only treat the low control characters, and DEL, as control
characters.

Also split Utility.Terminal out of Utility.SafeOutput. The latter needs
win32, but Utility.SafeOutput is used by Control.Exception, which is
used by Setup.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-12 13:53:30 -04:00
Joey Hess
31111d15c8
newline and tab are safe control characters
Oops, let's let git-annex display those! Lol
2023-04-11 15:38:47 -04:00
Joey Hess
afa5b883dc
find, findkeys, examinekey: escape output to terminal when --format is not used
Note that filenames are not quoted, only escaped. This is to match the
output of --format with escaping.

Sponsored-by: Lawrence Brogan on Patreon
2023-04-11 15:27:07 -04:00
Joey Hess
df6f9f1ee8
filter out control characters and quote filenames
Searched for uses of putStr and hPutStr and changed appropriate ones to filter
out control characters and quote filenames.

This notably does not make find and findkeys quote filenames in their default
output. Because they should only do that when stdout is non a pipe.

A few commands like calckey and lookupkey seem too low-level to make sense to filter
output, so skipped those.

Also when relaying output from other commands that is not progress output,
have git-annex filter out control characters.

Sponsored-by: k0ld on Patreon
2023-04-11 14:27:22 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Joey Hess
81bc57322f
clean up 2023-04-07 17:20:58 -04:00
Joey Hess
d9b6be7782
convert encode_c to ByteString
This turns out to be possible after all, because the old one decomposed
a unicode Char to multiple Word8s and encoded those. It should be faster
in some places, particularly in Git.Filename.encodeAlways.

The old version encoded all unicode by default as well as ascii control
characters and also '"'. The new one only encodes ascii control
characters by default.

That old behavior was visible in Utility.Format.format, which did escape
'"' when used in eg git-annex find --format='${escaped_file}\n'
So made sure to keep that working the same. Although the man page only
says it will escape "unusual" characters, so it might be able to be
changed.

Git.Filename.encodeAlways also needs to escape '"' ; that was the
original reason that was escaped.

Types.Transferrer I judge is ok to not escape '"', because the escaped
value is sent in a line-based protocol, which is decoded at the other
end by decode_c. So old git-annex and new will be fine whether that is
escaped or not, the result will be the same.

Note that when asked to escape a double quote, it is escaped to \"
rather than to \042. That's the same behavior as git has. It's
perhaps somehow more of a special case than it needs to be.

Sponsored-by: k0ld on Patreon
2023-04-07 17:10:49 -04:00
Joey Hess
371d4f8183
decode_c converted to ByteString
This speeds up a few things, notably CmdLine.Seek using Git.Filename
which uses decode_c and this avoids a conversion to String and back,
and probably the ByteString implementation of decode_c is also faster
for simple cases at least than the string version.

encode_c cannot be converted to ByteString (or if it did, it would have
to convert right back to String in order to handle unicode).

Sponsored-by: Brock Spratlen on Patreon
2023-04-07 14:44:19 -04:00
Joey Hess
d2d7d766d8
fix comment 2023-03-28 12:40:08 -04:00
Joey Hess
cd076cd085
Windows: Support urls like "file:///c:/path"
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.

Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.

An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.

It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-03-27 13:38:02 -04:00
Joey Hess
79d00a3b9f
fix build warning on windows 2023-03-27 12:17:55 -04:00
Joey Hess
2b5fa091e2
annex.maxextensionlength for view
view: Support annex.maxextensionlength when generating filenames for the
view branch.

Note that refining an existing view will reuse the extension length that was
configured when initially constructing the view. This is necessarily the case
because it reuses the filenames.

Also view files used to have all extensions at the end, no matter how
many there were. Since annex.maxextensionlength's documentation includes
that it's limited to 2 extensions, I made it consistent with that.

Sponsored-by: k0ld on Patreon
2023-03-24 14:01:38 -04:00
Joey Hess
e822df2a09
fix build warnings on windows 2023-03-21 18:41:23 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Yaroslav Halchenko
100f5aabb6
Typo: recurrance -> recurrence 2023-03-17 15:14:54 -04:00
Yaroslav Halchenko
0ae5ff797f
Typo: sansative -> sensitive 2023-03-17 15:14:50 -04:00
Yaroslav Halchenko
e018ae1125
Fix ambigous typos 2023-03-17 15:14:47 -04:00
Joey Hess
1eb64f0249
fix OSx build better 2023-03-09 11:43:55 -04:00
Joey Hess
0dac6411d8
fix build on OSX
Breakage from 54ad1b4cfb
2023-03-08 12:17:09 -04:00
Joey Hess
ba2d51ca80
remove redundant import 2023-03-08 11:41:20 -04:00
Joey Hess
433608753a
further fix windows build
Thanks to jkniiv for this patch.
2023-03-06 12:15:53 -04:00
Joey Hess
398633c12b
fix build on windows 2023-03-03 12:58:39 -04:00
Joey Hess
33a05a1a91
small RawFilePath optimisation 2023-03-02 10:53:12 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
505f1a654b
avoid using Utility.Path.AbsRel in Utility.Path.Windows
AbsRel depends on unix, but Utility.Path.Windows will be used in some
libraries that are part of the setup-depends, which cannot depend on
unix.

The only reason that AbsRel uses getWorkingDirectory on unix is that
it returns RawFilePath. getCurrentDirectory returns FilePath and so
needs a conversion to RawFilePath. Looks like a newer version of
directory will fix that, by using OsPath, so eventually AbsPath should
be able to switch to using getCurrentDirectory on unix, and then the
small code duplication in this commit won't be needed.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 14:01:33 -04:00
Joey Hess
3c08af0da1
factor out convertToWindowsNativeNamespace into its own module
Gonna use this more widely.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 13:28:39 -04:00
Joey Hess
9b1fe37818
improve adjusted branch name parsing to support adjusted view branches
An adjusted view branch has a name like
"adjusted/views/master(author=_)(unlocked)"
and so the adjustment starts at the last open paren, not the first open
paren.

Note that git-annex sync still does not do anything useful when run in
such a branch, because it does not realize that it is a view branch.
This is only groundwork for adjusted view branches.

This also fixes adjusted branches when the basis branch name contains
parens for some other reason, though that is not common in a git branch
name.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2023-02-27 14:09:05 -04:00
Joey Hess
96d46db2d5
Support http urls that contain ":" that is not followed by a port number
The same as git does.

Sponsored-by: Dartmouth College's DANDI project
2023-02-10 13:34:47 -04:00
Joey Hess
d079fb0578
work around ldd crash on arm64 autobuilder
This does mean it has to run ldd once per executable, which is actually
quite a number of times, so will be a bit slower.

Sponsored-by: Luke Shumaker on Patreon
2023-01-28 14:26:01 -04:00
Joey Hess
a00513754b
support old libgcc1 name
Used in Debian jessie, which the i386ancient autobuilder uses.
2023-01-26 15:37:21 -04:00
Joey Hess
f316b7f105
Revert "Removed the vendored git-lfs and the GitLfs build flag"
This reverts commit efda811404.

Turns out that datalad is building git-annex against debian bullseye.
https://github.com/datalad/git-annex/issues/149
2023-01-04 17:33:29 -04:00
Joey Hess
efda811404
Removed the vendored git-lfs and the GitLfs build flag
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.

Debian stable does have a too old haskell-git-lfs package to be able to
build git-annex from source, but there is not currently a backport of a
recent git-annex to Debian stable. And if they update the backport at some
point, they should be able to backport the library too.

Sponsored-by: Svenne Krap on Patreon
2022-12-26 12:49:53 -04:00
Joey Hess
d475f82c62
Added libgcc_s.so.1 to the linux standalone build so pthread_cancel will work
In Makefile, listed additional deps of Build/Standalone. Without that,
it does not get updated for the change to Utility/LinuxMkLibs.hs when
compiling incrementally.

Sponsored-by: Dartmouth College's DANDI project
2022-12-22 15:15:25 -04:00
Joey Hess
2a3958cbe1
new metric prefixes give us quettabyte and yottabyte
There does not yet seem to be an update to IEC 80000-13, so no
robibyte or quebibyte yet.

Sponsored-by: k0ld on Patreon
2022-11-18 12:03:12 -04:00
Joey Hess
2eedf58630
replace guessed win32 version with actual version 2.13.4.0 2022-11-09 13:08:27 -04:00
Joey Hess
9187a37dec
fix build warning 2022-10-27 10:21:24 -04:00
Joey Hess
14f7a386f0
Make git-annex enable-tor work when using the linux standalone build
Clean the standalone environment before running the su command
to run "sh". Otherwise, PATH leaked through, causing it to run
git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set,
which caused that script to not work:

[2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"]
/home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found

Changed programPath to not use GIT_ANNEX_PROGRAMPATH,
but instead run the scripts at the top of GIT_ANNEX_DIR.
That works both when the standalone environment is set up, and when it's
not.

Sponsored-by: Kevin Mueller on Patreon
2022-10-26 15:45:08 -04:00
Joey Hess
f1c85ac11b
fix breakage in wormhole's sendFile
Commit ff0927bde9 broke this, it made it
try to read all of the input before looking for the code. But, wormhole
keeps running until it sends the file, so that caused a deadlock. Oops.

Sponsored-by: Luke Shumaker on Patreon
2022-09-26 15:26:29 -04:00
Joey Hess
17129fed66
fix wormhole --appid option position
p2p: Pass wormhole the --appid option before the receive/send command, as
it does not accept that option after the command

I'm left wondering, did I get this wrong from the beginning, or did
wormhole change its option parser? I'm reminded of the change in 0.8.2
where it silently changed what FD the pairing code was output to.
But, looking at the wormhole source, it was at least putting --appid before
send in its test suite from the introduction of the option.

So I think probably this has always been broken. On 2021-12-31 the --appid
option was enabled, and it took until now for someone to try
git-annex p2p --pair and notice that flag day broke it..

Sponsored-by: Svenne Krap on Patreon
2022-09-26 15:11:38 -04:00
Joey Hess
1fe9cf7043
deal with ignoreinode config setting
Improve handling of directory special remotes with importtree=yes whose
ignoreinode setting has been changed. (By either enableremote or by
upgrading to commit 3e2f1f73cbc5fc10475745b3c3133267bd1850a7.)

When getting a file from such a remote, accept the content that would have
been accepted with the previous ignoreinode setting.

After a change to ignoreinode, importing a tree from the remote will
re-import and generate new content identifiers using the new config. So
when ignoreinode has changed to no, the inodes will be learned, and after
that point, a change in an inode will be detected as a change. Before
re-importing, a change in an inode will be ignored, as it was before the
ignoreinode change. This seems acceptble, because the user can re-import
immediately if they urgently need to add inodes. And if not, they'll
do it sometime, presumably, and the change will take effect then.

Sponsored-by: Erik Bjäreholt on Patreon
2022-09-16 14:11:25 -04:00
Joey Hess
8a4cfd4f2d
use getSymbolicLinkStatus not getFileStatus to avoid crash on broken symlink
Fix crash importing from a directory special remote that contains a broken
symlink.

The crash was in listImportableContentsM but some other places in
Remote.Directory also seemed like they could have the same problem.

Also audited for other places that have such a problem. Not all calls
to getFileStatus are bad, in some cases it's better to crash on something
unexpected. For example, `git-annex import path` when the path is a broken
symlink should crash, the same as when it does not exist. Many of the
getFileStatus calls are like that, particularly when they involve
.git/annex/objects which should never have a broken symlink in it.

Fixed a few other possible cases of the problem.

Sponsored-by: Lawrence Brogan on Patreon
2022-09-05 13:46:32 -04:00