IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.
It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.
Sponsored-by: Dartmouth College's DANDI project
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)
Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.
AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.
Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.
Sponsored-by: Shae Erisson on Patreon
Eg, showImprecise 1 1.99 returned "1.1" rather than "2". The 9 rounded
upward to 10, and that was wrongly used as the decimal, rather than
carrying the 1.
Sponsored-by: Jack Hill on Patreon
Clear visible progress bar first.
Removed showSideActionAfter because it can't be used in reconcileStaged
(import loop). Instead, it counts the number of files it
processes and displays it after it's seen a sufficient to know it's
taking a while.
Sponsored-by: Dartmouth College's Datalad project
Avoids users thinking this scan is a big deal, when it's not in the
majority of repos.
showSideActionAfter has some ugly caveats, since it has to display in
the background of another action. I could not see a better way to do it
and it works fine in this particular case. It also doesn't really belong
in Annex.Concurrent, but cannot go in Messages due to an import loop.
Sponsored-by: Dartmouth College's Datalad project
ghc 8.8.4 seems to have changed something that broke code that has been
successfully using forkProcess since 2012. Likely a change to GC internals.
Since forkProcess has never had clear documentation about how to
use it safely, avoid using it at all. Instead, when git-annex needs to
daemonize itself, re-run the git-annex command, in a new process group
and session.
This commit was sponsored by Luke Shumaker on Patreon.
Fix behavior of several commands, including reinject, addurl, and rmurl
when given an absolute path to an unlocked file, or a relative path that
leaves and re-enters the repository.
To avoid slowing down all the cases where the paths are already ok
with an unncessary call to getCurrentDirectory, put in an optimisation
in relPathCwdToFile. That will probably also speed up other parts of
git-annex by some small amount, but I have not benchmarked.
Note that I did not convert branchFileRef, because it seems likely that
it will be used with a file that is not provided by the user, so is already
in a sane format. This is certainly true for the way git-annex uses it,
though maybe arguable to the extent Git.Ref is a reusable library.
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.
The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.
Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
New error message:
Remote foo not usable by git-annex; setting annex-ignore
http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1
If git config parse fails, or the git config file is not available at the url,
a better error message for that is also shown.
This commit was sponsored by Mark Reidenbach on Patreon.
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.
Not tested at all yet, but I imagine it will work!
Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
As yet unused.
Backend.External could perhaps implement it too, although that would
involve sending chunks of data to it via a pipe or something, so likely
to be slow.
Note this does find things in PATH that are not executable.
Like searchPath use, the executable bit is not checked. Thing is,
there does not seem to be a binding for access(), which would be the
right way to check that the right execute bit is set. Anyway, if it's in
PATH and it's a file, it's probably fine to treat it as something that
was intended to be executable.
This commit was sponsored by Brock Spratlen on Patreon.
Fix an oddity in matching options and preferred content expressions such as
"foo (bar or baz)", which was incorrectly handled as if it were "(foo or
bar) and baz)" rather than the intended "foo and (bar or baz)"
Seemed like a change to consume should be able to handle this case
better, but I was having trouble writing it that way, so instead added
a separate pass that inserts the implicit ands explicitly. Also added
several test cases to make sure versions with and without explicit ands
generate the same.
Seems that dropDrive on windows only drops eg c:/ but not a leading /
while on linux, it does drop a leading / (which is what it considers
to be equivilant to a drive letter. I had been relying on it to drop
both. So need to drop leading directory separators.
Also, if the quickcheck generated input is eg "c:c:c:c:foo",
dropDrive will only drop the first one, leaving a path that's
still not relative. So instead of using dropDrive, just remove the
colons from the path.
Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.
A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.
I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.
There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.
This commit was sponsored by Brett Eisenberg on Patreon.
It was just slapping on a path separator to the front of the path to
make it absolute, but on windows, a path like "//foo/bar" actually
has a network "drive" of "//foo" and so that broke the test case.
Since "a:foo" is a somehow relative path on windows
(who knows how), drop any drive from the input. But dropDrive also drops
any leading path separator, making the input path relative. So now
it should be safe to slapp on a leading path separator.
Useful for eg, replicating failures in ghci. No need for this to be a
smart constructor, as long as it's used with valid filepaths, it's ok
and if not the test breaks.
This was not a good test, it broke the requirement that
relPathDirToFileAbs take absolute paths. And it failed when the two
input paths were eg, the same but differently normalized.
Replaced with some tests of the real basics of that function.
And vice-versa, but it's better to use '/' for portability.
Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
Adding new instance for Integer, and some parsers for more parameters.
The conversion of readish to readMaybe is done because a serialized
exit code cannot contain additional text after the number.
New config annex.stalldetection, remote.name.annex-stalldetection, which
can be used to deal with remotes that stall during transfers, or are
sometimes too slow to want to use.
This commit was sponsored by Luke Shumaker on Patreon.
This old code will now be useful for git-annex beyond the assistant.
git-annex won't use the CheckTransferrer part, and won't run transferkeys
as a batch process, and will want withTransferrer to not shut down
transferkeys processes. Still, the rest of this is a good fit for what I
need now.
Also removed some dead code, and simplified a little bit.
This commit was sponsored by Mark Reidenbach on Patreon.
Json objects not yet handled, and some other special cases, but this is
the bulk of the messages.
For progress meters, POSIXTime does not have a Read instance (or a
suitable Show instance), so had to switch to using a Double for progress
meters.
This commit was sponsored by Ethan Aubin on Patreon.
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.
I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink
This commit was sponsored by Ethan Aubin.
It does use it on both stdout and stderr. It seems unlikely the problem
could really affect stdout, but the unix implementation of it combines
both into a single handle in any case.
It had lost the hAcceptEncoding header that is set as part of the
overriding of http-client's default decompression of compressed files.
Seems likely that would have caused resuming of compressed files to fail
in some cases.
This commit was sponsored by Brett Eisenberg on Patreon.
Fix a bug that could make resuming a download from the web fail when the
entire content of the file is actually already present locally.
What a mess that Request can throw exceptions or not, depending on how
it's configured. Makes it very hard if you need to handle some specific
http status codes in a function like this! Implementing everything two
ways did not seem appealing, if possible at all, so I decided to
override the Request if it did come configured to throw exception on
non-2xx http status. Other exceptions, like from http-client-restricted,
or due to a redirect to a non-http url, still get thrown.
This commit was sponsored by Luke Shumaker on Patreon.
This removeLink was introduced in commit
e505c03bcc, which replaced code
that used removeFile on Windows. So, I know git-annex did not used to do
anything other than removeFile on Windows. If there were symlinks it
wanted to remove, this would not work on windows, but of course it does
not use symlinks on windows.
Unfortunately, there is no hGetNewLineMode. This seems like an oversight
that should be fixed in ghc, but for now, I paper over it with a windows
hack.
The problem with the old version seemed to be that hWaitForInput blocks
rather than timing out when being run concurrently with hGetLine on the
same handle.
This passes the bench test, and also works when run concurrently on
different handles.
This seems to show that hWaitForInput does not seem to behave as
documented. It does not time out, so blocks forever in this situation.
This is with a 0 timeout and with larger timeouts. Unsure why, it looked
like it should work.
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.
A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
instance Arbitrary [Char] allows that, and it's not a legal part of a
filename so can break processing them.
Noticed when prop_view_roundtrips failed.
The instance Arbitrary AssociatedFile avoids this problem.
This commit was sponsored by Mark Reidenbach on Patreon.
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.
This commit was sponsored by Graham Spencer on Patreon.
Had to split out some modules because getWorkingDirectory needs unix,
which is not a build-dep of configure.
This commit was sponsored by Brock Spratlen on Patreon.
This will break a lot of stuff that uses it, but once fixed should lead
to better performance.
Mostly mechanical.
Changes of note:
* upFrom now uses isPathSeparator, which is better on Windows where
there is not just one
* splitShortExtensions used to take the length of a string,
which would count wide unicode characters as a single character.
Changing to B.length changes that. Note that, git-annex's
annexMaxExtensionLength already changed to the length in bytes
before this change. This function is only used in generating views,
and the small behavior change should not be a problem.
* relHome still uses FilePath because it didn't seem worth changing(?)
This commit was sponsored by Jack Hill on Patreon.
The problem was this line:
cleanup = and <$> sequence (map snd v)
That caused all of v to be held onto until the end, when the cleanup action
was run.
I could not seem to find a bang pattern that avoided the leak, so I
resorted to a IORef, rather clunky, but not a performance problem because
it will only be written once per git ls-files, so typically just 1 time.
This commit was sponsored by Mark Reidenbach on Patreon.
inet_addr was removed, but all this needs is localhost, so hardcoding it
should work fine.
It may be that this windows ifdef is no longer needed. It was added in 2013
with a note that getAddrInfo didn't work on windows, but it seems likely
such a problem would have been fixed since.
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.
The youtube-dl output is no longer displayed, except for any errors.
This commit was sponsored by Denis Dzyubenko on Patreon.
Import small files into git, the same as is done when importing with content.
Which means, for small files, --no-content does download them.
If the largefiles expression needs the file content available
(due to mimetype or mimeencoding being used), the import will fail.
This commit was sponsored by Jake Vosloo on Patreon.
This was the last one marked as a zombie. There might be others I don't
know about, but except for in the hypothetical case of a thread dying
due to an async exception before it can wait on a process it started, I
don't know of any.
It would probably be safe to remove the reapZombies now, but let's wait
and so that in its own commit in case it turns out to cause problems.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
matchNeedsFileContent is not used yet, but shows how to add information
about terminals. That one would be needed for
https://git-annex.branchable.com/todo/sync_fast_import/
Note the tricky bit in Annex.FileMatcher.call where it folds over the
included matcher to propagate the information.
This commit was sponsored by Svenne Krap on Patreon.
getPid returns Nothing if the process has already been stopped, and in that
case, the pid will not be displayed. I think that would only happen if
waitForProcess or similar gets called more than once on the same process
handle though.
getPid on unix has an overhead of only a MVar read. On Windows it needs to
make a syscall, so will be probably more expensive. While the added expense
happens even when debug logging is disabled, it should be small enough
compared with the overhead of starting a process that it's not a problem.
(It does occur to me that a debugM that took an IO String could only run it
when debugging is really enabled, which would improve performance. It does
not seem possible to use the current hslogger interface to do that though;
it does not expose the information that would be needed.)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.
Over the course of 2 grueling days.
withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.
Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/
although I don't know if it does.
My thinking is, ResourceT may allocate a resource and then free it,
and a unforced thunk to that resource could result in reading memory
that has since been overwritten by something else, or in a SEGV,
depending. While that seems kind of like a bug in ResourceT to me, if it
is what's happening, this will avoid it. If it's not, this doesn't
really hurt much since the values are all smallish.
This commit was sponsored by Graham Spencer on Patreon.
That made Utility.FileMode depend on unix, but Utility.Tmp now depends
on it, and is used by Setup, which does not. So it was easiest to remove
this, especially since it's not used.
Fixed several cases where files were created without file mode bits that
the umask would usually set. This included exports to the directory special
remote, torrent files used by the bittorrent special remote, hooks written
by git-annex init, and some log files in .git/annex/
Audited all calls, looking for ones that didn't want the umask bits to be
set. All such turned out to already set the specific restrictive file mode
they wanted.