Not likely to be any speed gain here, but this completes porting every
log file over.
And, it let me get rid of code copied from ghc and modified, so
simplifying the licensing.
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.
There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.
Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...
For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1 group2 " due to excess
whitespace, which would be even more confusing behavior.
The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
Mostly didn't push the ByteStrings down very deep, but all of these log
files are not written to frequently at all, so slight remaining
innefficiency doesn't matter.
In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in
git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was
last touched by that ancient git-annex version, the descriptions of remotes
would appear missing when used with this version of git-annex. That is such minor
breakage, and so unlikely to still be a problem for any repos, that it was not
worth forward-porting that code to ByteString.
Tested on an older ghc by enabling MonadFailDesugaring globally.
In TransferQueue, the lack of a MonadFail for STM exposed what would
normally be a bug in the pattern matching, although in this case an
earlier check that the queue was not empty avoided a pattern match
failure.
This is not as efficient as using ByteStrings throughout, but converting
the String to ByteString is actually significantly faster than the old
parser.
benchmarking parse/old
time 9.657 μs (9.600 μs .. 9.732 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 9.703 μs (9.645 μs .. 9.785 μs)
std dev 231.6 ns (161.5 ns .. 323.7 ns)
variance introduced by outliers: 25% (moderately inflated)
benchmarking parse/new
time 834.6 ns (797.1 ns .. 886.9 ns)
0.987 R² (0.976 R² .. 0.999 R²)
mean 816.4 ns (802.7 ns .. 845.1 ns)
std dev 62.39 ns (37.66 ns .. 108.4 ns)
variance introduced by outliers: 82% (severely inflated)
There is a small behavior change from the old parsePOSIXTime,
which accepted any amount of trailing whitespace after the timestamp.
That behavior was not documented, and it doesn't seem anything relied on it.
It used to display the "bad feed content" message indicating there were no
enclosures found, which was misleading when the http request for the feed
failed.
This commit was sponsored by Ewen McNeill on Patreon.
downloadUrl uses meteredFile, which sets up one progress meter,
and Remote.Web also uses metered, so two progress meters are displayed for
the same download.
Reversion introduced with the http-conduit switch in
c34152777b -- I don't know why the extra
call to metered was added there.
When -J is not used, the extra progress meter didn't display,
but an extra blank line did get output, which is also fixed.
This commit was sponsored by John Pellman on Patreon.
init: When --version=5 is passed on a crippled filesystem, use a v5 direct
mode repo as requested, rather than upgrading to v7 adjusted unlocked.
Fixed test suite on crippled filesystems, making it request --version=5
to test direct mode.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.
Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
This fixes a bug with the numcopies counting when using sync --content.
It did not always pass the local repo uuid to handleDropsFrom, and so the
numcopies counting was off by one, and unwanted local content would only be
dropped when there were numcopies+1 remote copies.
Also, support dropping local content that has reached an
exporttree remote that is not untrusted (currently only S3 remotes
with versioning).
* Fix bug upgrading from direct mode to v7: when files in the repository
were already committed as v7 unlocked files elsewhere, and the
content was present in the direct mode repository, the annexed files
got their full content checked into git.
* Fix bug that caused v7 unlocked files in a direct mode repository
to get locked when committing.
This commit was sponsored by Nick Piper on Patreon.
When a file was already unlocked, but the annex object was present, the
upgrade process populated the unlocked file, but neglected to update the
index.
This commit was sponsored by Jochen Bartl on Patreon.
webdav: When initializing, avoid trying to make a directory at the top of
the webdav server, which could never accomplish anything and failed on
nextcloud servers. (Reversion introduced in version 6.20170925.)
This commit was sponsored by mo on patreon.
No deprecation warning at run time, just one on the man page.
One thing findref remains able to do that find cannot is to run in a bare
repo. Find was made to refuse to run in a bare repo because it seemed
confusing for it to not list any files ever in that situation. It would be
better for find --branch to work in a bare repo but not without --branch
but I don't currently have a way to do that.
Probably a better solution would be to make git-annex in a bare repo
default to --branch master or something like that instead of --all.
This commit was sponsored by Denis Dzyubenko on Patreon.
* findref: Support file matching options: --include, --exclude,
--want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin
* Commands supporting --branch now apply file matching options --include,
--exclude, --want-get, --want-drop to filenames from the branch.
Previously, combining --branch with those would fail to match anything.
* add, import, findref: Support --time-limit.
This commit was sponsored by Jake Vosloo on Patreon.
When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.
When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.
In both cases, display a hopefully useful warning.
This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.
Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Note that it does not prevent storing p2p access tokens or multicast
encryption keys, since those are not cached; the previous commit
established the distinction.
How well this works depends on how often getRemoteCredPair is called and
how expensive it is. In some cases setting this will result in an annoying
number of gpg password prompts and/or slowdowns due to reading creds
from the git-annex branch and decrypting, which could be improved by calling
getRemoteCredPair less often.
This commit was sponsored by Ilya Shlyakhter on Patreon.
dropunused: When an unused object file has gotten modified, eg due to
annex.thin being set, don't silently skip it, but display a warning and let
--force drop it.
This commit was sponsored by Ethan Aubin.
info: When used with an exporttree remote, includes an "exportedtree" info,
which is the tree last exported to the remote. During an export conflict,
multiple values will be listed.
This commit was sponsored by John Pellman on Patreon.
* init: When a crippled filesystem causes an adjusted unlocked branch to
be used, set repo version to 7, which it neglected to do before.
* init: When on a crippled filesystem, and the git version is too old
to use an adjusted unlocked branch, fall back to using direct mode.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Seems that youtube-dl --get-filename on a playlist lists all the filenames
for the playlist, which can take quite some time. The code already only
took the first name, so --no-playlist can speed it up a lot.
This commit was sponsored by Brett Eisenberg on Patreon.
And added stack-lts-9.9.yaml to support old versions of stack.
The i386 ancient autobuilder needs stack-lts-9.9.yaml; the OSX autobuilder
may also use it for a while, and it's needed to build on eg debian stable.
That didn't actually happen, newer lts like that one are not supported
by the version of stack in Debian stable, used for the i386-ancient
autobuild, and generally I want git-annex to be buildable on stable
releases of linux distros etc. So stack.yaml is going to be stuck on old
versions for some time until some years after stack stops breaking backwards
compatability.
When a command is operating on multiple files and there's an error with
one, try harder to continue to the rest. (As was already done for many
types of errors including IO errors.)
This handles cases like lockContentForRemoval throwing an exception when
the content is already locked. Just because a drop of one file fails, does
not mean it shouldn't go on to try to drop other files.
I looked over uses of `giveup` in Command/*; there are too many to check
them all extensively, but none stood out as being problems that should let
one commandAction stop running other commandActions. Worst case, something
bad will happen and rather than stopping right away with an error,
git-annex will display multiple errors as it fails over and over on each
file. I don't think I ever really intended `error`/`giveup` to stop other
commandActions; this was a relic of old confusion over haskell exception
handling.
Test suite passes.
This commit was sponsored by Ethan Aubin.
* drop -J: Avoid processing the same key twice at the same time when
multiple annexes files use it.
This prevents a drop of a key conflicting with another drop of the same
key.
This commit was sponsored by Brock Spratlen on Patreon.
export, sync --content: Avoid unnecessarily trying to upload files to an
exporttree remote that already contains the files.
When the export was origianly made in one repo and now git-annex is
running in a different repo, the export database is not yet populated with
information about the exportLocation of files. So, it was trying to upload
the files to the export, even when it already contained them.
sync --content would first download the content from the export, and then
re-upload the content back.
And this also led to "not available" failures for each file that was not
locally present yet.
Fix: Just use checkPresentExport before uploading; if it succeeds update
the database.
This is a surprising oversight, it's possible it fixes a reversion because
I would have thought I'd have noticed this problem when originally
developing exporttree remotes.
This commit was sponsored by Jochen Bartl on Patreon.
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.
This commit was sponsored by Trenton Cronholm on Patreon.
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Fix bash completion of "git annex" to propertly handle files with spaces
and other problem characters. (Completion of "git-annex" already did.)
This commit was sponsored by Jake Vosloo on Patreon.
Finishes the start made in 983c9d5a53, by
handling the case where `transfer` fails for some other reason, and so the
ReadContent callback does not get run. I don't know of a case where
`transfer` does fail other than the locking dealt with in that commit, but
it's good to have a guarantee.
StoreContent and StoreContentTo had a similar problem.
Things like `getViaTmp` may decide not to run the transfer action.
And `transfer` could certianly fail, if another transfer of the same
object was in progress. (Or a different object when annex.pidlock is set.)
If the transfer action was not run, the content of the object would
not all get consumed, and so would get interpreted as protocol commands,
which would not go well.
My approach to fixing all of these things is to set a TVar only
once all the data in the transfer is known to have been read/written.
This way the internals of `transfer`, `getViaTmp` etc don't matter.
So in ReadContent, it checks if the transfer completed.
If not, as long as it didn't throw an exception, send empty and Invalid
data to the callback. On an exception the state of the protocol is unknown
so it has to raise ProtoFailureException and close the connection,
same as before.
In StoreContent, if the transfer did not complete
some portion of the DATA has been read, so the protocol is in an unknown
state and it has to close the conection as well.
(The ProtoFailureMessage used here matches the one in Annex.Transfer, which
is the most likely reason. Not ideal to duplicate it..)
StoreContent did not ever close the protocol connection before. So this is
a protocol change, but only in an exceptional circumstance, and it's not
going to break anything, because clients already need to deal with the
connection breaking at any point.
The way this new behavior looks (here origin has annex.pidlock = true so will
only accept one upload to it at a time):
git annex copy --to origin -J2
copy x (to origin...) ok
copy y (to origin...)
Lost connection (fd:25: hGetChar: end of file)
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Fix hang when transferring the same objects to two different clients at the
same time. (Or when annex.pidlock is used, two different objects to the
same or different clients.)
Could also potentially occur if a client was downloading an object and
somehow lost connection but that git-annex-shell was still running and
holding the transfer lock.
This does not guarantee that, if `transfer` fails for some other reason,
a DATA response will be made.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Not the first time this kind of test suite breakage has happened..
It would be good to avoid somehow it looking up from .t and finding a git
repo. But just running the test suite from time to time outside of
git-annex would also let me notice these before the distribution packagers
do.
This commit was sponsored by mo on Patreon.