Commit graph

106 commits

Author SHA1 Message Date
Joey Hess
f1c2e18b8d
improve attribution armoring
Split out an author parameter, will make it easier to add authors and
reads better.

Got rid of the function without the copyright year, because an adversary
could have mechanically changed the function with a copyright year to
the one without, and so bypassed the protection of LLM copyright
year hallucination.

Sponsored-by: Luke T. Shumaker on Patreon
2023-11-21 11:34:21 -04:00
Joey Hess
d5d570a96c
avoid replacing otherwise
While authorJoeyHess is True same as otherwise, ghc's exhastiveness
checker turns out to special case otherwise. So this avoids warnings.
2023-11-20 20:25:51 -04:00
Joey Hess
cda3e85164
make my authorship explicit in the code
This is intended to guard against LLM code theft, which is the current
bubble technology de jour.

Note that authorJoeyHess' with a year older than the year I began
developing git-annex will behave badly, by intention. Eg, it will spin
and eventually crash.

This is not the first anti-LLM protection in git-annex. For example see
9562da790f. That method, while much harder
for an adversary to detect and remove, also complicates code somewhat
significantly, and needs extensions to be enabled. There are also
probably significantly fewer ways to implement that method in Haskell.
This new approach, by contrast, will be easy to add throughout the code
base, with very little effort, and without complicating reading or
maintaining it any more than noticing that yes, I am the author of this
code.

An adversary could of course remove all calls to these functions
before feeding code into their LLM-based laundry facility. I think this
would need to be done manually, or with the help of some fairly advanced
Haskell parsing though. In some cases, authorJoeyHess needs to be
removed, while in other places it needs to be replaced with a value.
Also a monadic use of authorJoeyHess' may involve other added monadic
machinery which would need to be eliminated to keep the code compiling.

Alternatively, an adversary could replace my name with something
innocuous. This would be clear intent to remove author attribution
from my code, even more than running it through an LLM laundry is.

If you work for a large company that is laundering my code through an
LLM, please do us a favor and use your immense privilege to quit and go
do something socially beneficial. I will not explain further
developments of this code in such detail, and you have better things to
do than playing cat and mouse with me as I explore directions such as
extending this approach to the type level.

Sponsored-by: k0ld on Patreon
2023-11-20 12:29:12 -04:00
Joey Hess
2b5fa091e2
annex.maxextensionlength for view
view: Support annex.maxextensionlength when generating filenames for the
view branch.

Note that refining an existing view will reuse the extension length that was
configured when initially constructing the view. This is necessarily the case
because it reuses the filenames.

Also view files used to have all extensions at the end, no matter how
many there were. Since annex.maxextensionlength's documentation includes
that it's limited to 2 extensions, I made it consistent with that.

Sponsored-by: k0ld on Patreon
2023-03-24 14:01:38 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Joey Hess
b2c48fb86b
Fix using lookupkey inside a subdirectory
Caused by dirContains ".." "foo" being incorrectly False.

Also added a test of dirContains, which includes all the previous bug fixes
I could find and some obvious cases.

Reversion in version 8.20211011

Sponsored-by: Brett Eisenberg on Patreon
2021-10-26 15:00:45 -04:00
Joey Hess
b2efbd1cd3
fix bug in dirContains
dirContains "." ".." was incorrectly true
because normalize ("." </> "..") = ".."

Sponsored-by: Jochen Bartl on Patreon
2021-10-01 13:53:21 -04:00
Joey Hess
e8959617b6
fix bug in dirContains
dirContains ".." "../.." was incorrectly True.

This does not seem to be an exploitable security hole, at least
as dirContains is used in git-annex.

Sponsored-by: Jochen Bartl on Patreon
2021-10-01 13:15:52 -04:00
Joey Hess
9595a247ae
fix test suite failure on windows
This was maybe a real bug too, although I don't know what circumstances
it would be a problem. See comment for analysis of this windows drive
letter wackyness issue.

Sponsored-by: Brock Spratlen on Patreon
2021-09-01 11:32:25 -04:00
Joey Hess
97129388d5
support fuzzy matching of addon commands
Note this does find things in PATH that are not executable.
Like searchPath use, the executable bit is not checked. Thing is,
there does not seem to be a binding for access(), which would be the
right way to check that the right execute bit is set. Anyway, if it's in
PATH and it's a file, it's probably fine to treat it as something that
was intended to be executable.

This commit was sponsored by Brock Spratlen on Patreon.
2021-02-02 19:37:09 -04:00
Joey Hess
1b63132ca3
add searchPathContents
And rename related functions for consistency.
2021-02-02 19:06:15 -04:00
Joey Hess
99ba471209
rewrite prop_relPathDirToFileAbs_basics
This was not a good test, it broke the requirement that
relPathDirToFileAbs take absolute paths. And it failed when the two
input paths were eg, the same but differently normalized.

Replaced with some tests of the real basics of that function.
2021-01-13 13:23:26 -04:00
Joey Hess
4c1eb28c40
fix build on windows 2020-11-10 11:21:03 -04:00
Joey Hess
885974be99
add newtypes for QuickCheck to avoid LANG=C issues
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.

A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
2020-11-09 20:21:18 -04:00
Joey Hess
907a0bcad6
avoid providing filename with NUL to quickcheck properties
instance Arbitrary [Char] allows that, and it's not a legal part of a
filename so can break processing them.

Noticed when prop_view_roundtrips failed.

The instance Arbitrary AssociatedFile avoids this problem.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-11-06 15:15:33 -04:00
Joey Hess
5a1e73617d
finished this stage of the RawFilePath conversion
Finally compiles again, and test suite passes.

This commit was sponsored by Brock Spratlen on Patreon.
2020-11-04 14:20:37 -04:00
Joey Hess
b724236b35
remove unused imports 2020-11-02 15:36:11 -04:00
Joey Hess
d6e94a6b2e
got configure working after Utility.Path ByteString conversion
Had to split out some modules because getWorkingDirectory needs unix,
which is not a build-dep of configure.

This commit was sponsored by Brock Spratlen on Patreon.
2020-10-28 15:01:19 -04:00
Joey Hess
e219aadbab
convert to RawByteString
This will break a lot of stuff that uses it, but once fixed should lead
to better performance.

Mostly mechanical.
Changes of note:

* upFrom now uses isPathSeparator, which is better on Windows where
  there is not just one
* splitShortExtensions used to take the length of a string,
  which would count wide unicode characters as a single character.
  Changing to B.length changes that. Note that, git-annex's
  annexMaxExtensionLength already changed to the length in bytes
  before this change. This function is only used in generating views,
  and the small behavior change should not be a problem.
* relHome still uses FilePath because it didn't seem worth changing(?)

This commit was sponsored by Jack Hill on Patreon.
2020-10-28 14:32:45 -04:00
Joey Hess
9a5cd96f0d
Fix a memory leak introduced in the last release
The problem was this line:

	cleanup = and <$> sequence (map snd v)

That caused all of v to be held onto until the end, when the cleanup action
was run.

I could not seem to find a bang pattern that avoided the leak, so I
resorted to a IORef, rather clunky, but not a performance problem because
it will only be written once per git ls-files, so typically just 1 time.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-10-13 16:31:01 -04:00
Joey Hess
f624876dc2
remove zombie process in file seeking
This was the last one marked as a zombie. There might be others I don't
know about, but except for in the hypothetical case of a thread dying
due to an async exception before it can wait on a process it started, I
don't know of any.

It would probably be safe to remove the reapZombies now, but let's wait
and so that in its own commit in case it turns out to cause problems.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-25 11:38:42 -04:00
Joey Hess
5117ae8aec
fix build warning 2020-09-25 11:07:41 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
4c9ad1de46
optimisation: stream keys through git cat-file --buffer
This is only implemented for git-annex get so far. It makes git-annex
get nearly twice as fast in a repo with 10k files, all of them present!

But, see the TODO for some caveats.
2020-07-10 13:54:52 -04:00
Joey Hess
83df96d1f5
fix build on windows 2020-06-30 11:01:28 -04:00
Joey Hess
2a8fdfc7d8
Display a warning message when asked to operate on a file inside a directory that's a symbolic link to elsewhere
This relicates git's behavior. It adds a few stat calls for the command
line parameters, so there is some minor slowdown, but even with thousands
of parameters it will not be very noticable, and git does the same statting
in similar circumstances.

Note that this does not prevent eg "git annex add symlink"; the symlink
will be added to git as usual. And "git annex find symlink" will silently
list nothing as well. It's only "symlink/foo" or "subdir/symlink/foo" that
triggers the warning.
2020-05-11 15:03:35 -04:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
662e5a5db9
small opt to absPath
Noticed that it gets the CWD unncessarily when the path is absolute.

I have not benchmarked this, but I guess that the small overhead of
isAbsolute is so tiny compared to the system call that it's worth
it even if most of the time relative paths are passed to absPath.
2020-03-05 13:52:30 -04:00
Joey Hess
444d5591ee
Improve file ordering behavior when one parameter is "." and other parameters are other directories
eg, `git-annex get . ..` used to order the files strangly, because it
did not realize that when git ls-files output eg "foo", that should be
grouped with the first set of files and not the second set.

Fixed by making            dirContains "." "./foo" = True
which makes sense, because dirContains ".." "../foo" = True
2019-12-20 18:01:29 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
8ea5f3ff99
explict export lists
Eliminated some dead code. In other cases, exported a currently unused
function, since it was a logical part of the API.

Of course this improves the API documentation. It may also sometimes
let ghc optimize code better, since it can know a function is internal
to a module.

364 modules still to go, according to
git grep -E 'module [A-Za-z.]+ where'
2019-11-21 16:08:37 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
c3449ff91d
finish fix for gitAnnexLink on windows
dropDrive needed since if splitPath splits out the drives, they would
appear different.
2017-10-26 12:01:16 -04:00
Joey Hess
0ae2ac282e
fix gitAnnexLink to not be absolute on Windows
Windows: Fix reversion that caused the path used to link to annexed
content include the drive letter and full path, rather than being
relative. (`git annex fix` will fix up after this problem).

I've not identified the commit that brought the reversion (probably it
happened this spring when I was removing MisingH and last touched
Utility.Path). Likely commit 18b9a4b8024115db67ae309fdaf54e1553037529?

The problem is that relPathDirToFile got called two paths that had the
slashes different ways around. Since takeDrive includes the first slash,
this made two paths on the same drive seem different and it bailed.

(ifdefs around this to avoid doing extra work on non-windows)

This commit was sponsored by Jack Hill on Patreon.
2017-10-25 19:36:29 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
18b9a4b802
remove absNormPathUnix again
Moving toward dropping MissingH dep.

I think I've addressed the problem identified earlier in
09a66f702d. On Windows,
absPathFrom "/tmp/repo/xxx" "y/bar" would be "/tmp/repo/xxx\\y/bar",
which then confuses relPathDirToFile. Fixed by converting to unix (git)
style paths.

Also, relPathDirToFile was splitting only on \\ on windows and not /
which broke the example in 09a66f702d of
relPathDirToFile (absPathFrom "/tmp/repo/xxx" "y/bar") "/tmp/repo/.git/annex/objects/xxx"

Now, on windows, that will yield "..\\..\\..\\.git/annex/objects/xxx"
which once converted to unix style paths is what we want.
2017-05-15 21:35:35 -04:00
Joey Hess
5358fb992a
Windows: Improve handling of shebang in external special remote program, searching for the program in the PATH.
findShellCommand needs a full path to a file in order to check it for a
shebang on Windows. It was being run with only the base name of the external
special remote program, which would only work when it was in the current
directory.

This is why users in
https://github.com/DanielDent/git-annex-remote-rclone/pull/10 and elsewhere
were complaining that the previous improvements to git-annex didn't make
git-remote-rclone work on Windows.

Also, reworked checkearlytermination, which while it worked, seemed
to rely on a race condition. And, improved its error messages.

This commit was sponsored by Shane-o on Patreon.
2017-03-08 15:59:00 -04:00
Joey Hess
b22409db38
avoid warnings about not exported System.Directory.isSymbolicLink 2016-04-28 15:18:11 -04:00
Joey Hess
5fe450514b
Fix build with directory-1.2.6.2.
It started exporting a isSymbolicLink which supports windows. But,
git-annex does no use symlinks on windows yet and this conflicts with the
function by the same name from unix-compat, so hide it.
2016-04-28 13:18:44 -04:00
Joey Hess
2aa3e73de1
remove now unused toCygPath 2016-01-13 12:36:53 -04:00
Pieter Kitslaar
6cd134ade1 Added new toMSYS2Path function for use with rsync on Windows. 2016-01-11 11:18:58 +01:00
Joey Hess
aa4f353e5d
clarify absPathFrom
The repo path is typically relative, not absolute, so
providing it to absPathFrom doesn't yield an absolute path.
This is not a bug, just unclear documentation.

Indeed, there seem to be no reason to simplifyPath here, which absPathFrom
does, so instead just combine the repo path and the TopFilePath.

Also, removed an export of the TopFilePath constructor; asTopFilePath
is provided to construct one as-is.
2016-01-05 17:33:48 -04:00
Joey Hess
969d54f914
cleanup 2015-12-06 16:36:35 -04:00
Joey Hess
04e150abb3
use intercalate instead of MissingH's join
The two functions are identical.
2015-11-17 17:27:24 -04:00
Joey Hess
3cff287b26 proxy: Fix behavior when run in subdirectory of git repo.
This fixes a reversion introduced by relative path changes back last winter.

The root cause is simplifyPath "../foo" was incorrectly yielding "foo".

absPathFrom seems quite horrible. Probably most things that use it should
use </> instead.
2015-08-04 14:58:21 -04:00
Joey Hess
ed4fe02896 disable horrible tab warning, needed in every file that Setup.hs pulls in
This is certianly a cabal bug for not passing the build options in the
cabal file when building Setup.hs.

And, why oh why did ghc enable this warning by default? So unhappy with
this choice.
2015-05-10 16:31:50 -04:00
Joey Hess
ec267aa1ea rejigger imports for clean build with ghc 7.10's AMP changes
The explict import Prelude after import Control.Applicative is a trick
to avoid a warning.
2015-05-10 16:20:30 -04:00
Joey Hess
cd31b69ff6 don't test with null paths 2015-04-14 15:15:29 -04:00
Joey Hess
f84ccaa4e8 fix relPathDirToFileAbs on windows with different drive letters
Since we started using this for git repos, when a remote was on another
drive, it resulted in a bogus relative path to it being used by git-annex,
which didn't work.
2015-04-14 14:16:44 -04:00
Joey Hess
294991dacb Significantly sped up processing of large numbers of directories passed to a single git-annex command. (try 2)
New approach is to do it the expensive way for the first 100 paths
on the command line, but then assume the user doesn't care about order too
much and fall back to the cheap way that does not preserve order.
2015-04-02 01:44:32 -04:00