* Fix bug that could make git-annex importfeed not see recently recorded
state when configured with annex.alwayscommit=false.
* importfeed: Made "checking known urls" phase run 12 times faster.
The massive speedup is because it no longer queries for metadata
accompanying each url. Instead it processes the whole git-annex branch and
checks all metadata files for feed item ids, and uses any it finds.
This could result in a behavior change, in an unlikely situation: If a feed
id is recorded in a key's metadata, but the url gets removed, the old code
would not see that item id and would re-download it if it finds an url for
it in a feed, while the new code will see the item id. I don't think
the old behavior was intentional, and it may be that the new behavior is
better. Not gonna worry about this.
Fix bug caused by recent optimisations that could make git-annex not see
recently recorded status information when configured with
annex.alwayscommit=false.
When not using --all, precaching only gets triggered when the
command actually needs location logs, and so there's no speed hit there.
This is a minor speed hit for --all, because it precaches even when the
location log is not actually going to be used, and so checking the journal
is not necessary. It would have been possible to defer checking the journal
until the cache gets used. But that would complicate the usual Branch.get
code path with two different kinds of caches, and the speed hit is really
minimal. A better way to speed up --all, later, would be to avoid
precaching at all when the location log is not going to be used.
init: Fix a crash when the repo's was cloned from a repo that had an
adjusted branch checked out, and the origin remote is not named "origin".
The only other hardcoding of the name of origin is in:
- Upgrade.V2, which can be ignored probably
- Annex.Branch, which doesn't fail if it has some other name, but just
doesn't set up the git-annex branch with quite as linear a history in
that case.
directory: When cp supports reflinks, use it when getting content from a
directory special remote.
Not yet for imports from directory though, and not for store.
Note that, when it's chunked, using cp --reflink would not speed it up, and
when reflink was not supported, would unnecessarily write the chunk to a
file before reading it back in. So, only using a fileRetriever in the
NoChunks case is necessary to keep chunking fast.
fileCopier is told not to verify, because the special remote interface
does not yet support verification in passing. AFAICS, fileCopies can
never return False when not verifying so the added giveup should never
actually happen.
When downloading content from a remote, if the content is able to be
verified during the transfer, skip checksumming it a second time.
Note that in this case, the fsck output does not include "(checksum)"
which it does when the checksumming is done separately from the download.
This commit was sponsored by Brock Spratlen on Patreon.
When git-annex transferrer started up, and the journal contained something,
it would commit it to the git-annex branch. This caused excess commits to
the branch, in cases where normally several changes would be journalled and
committed together. That generated some excess git objects and was also
just noisy on stdout.
Since transferrer uses enableInteractiveBranchAccess, it does not need to
commit journalled changes, since the optimisation that avoids checking
the journal when reading from the branch is disabled for processes that
call that.
This commit was sponsored by Svenne Krap on Patreon.
persistent stopped using askLogFunc, and the thing to use is askLoggerIO
from monad-logger. Bumped the dep to the first version that contained that.
Note that the i386ancient build uses a newer monad-logger than 0.3.10,
so the new versioned dep should not break it, and presumably nothing else
either.
This commit was sponsored by Noam Kremen on Patreon.
Which generated unusual git trees that could confuse git merge,
since they incorrectly had 2 subtrees with the same name.
Root of the bug was a) not testing that at all! but also
b) confusing graftdirs, which contains eg "foo/bar" with
non-recursively read trees, which would contain eg "bar"
when reading a subtree of "foo".
It's worth noting that Annex.Import uses graftTree, but it really
shouldn't have needed to. Eg, when importing into foo/bar from a remote,
it's enough to generate a tree of foo/bar/x, foo/bar/y, and does not
include other files that are at the top of the master branch. It uses
graftTree, so it does include the other files, as well as the foo/bar
tree. git merge will do the same thing for both trees. With that said,
switching it away from graftTree would result in another import
generating a new commit that seems to delete files that were there in a
previous commit, so it probably has to keep using graftTree since it
used it before.
This commit was sponsored by Kevin Mueller on Patreon.
Note that a key with no size field that is hard linked will
result in listImportableContents reporting a file size of 0,
rather than the actual size of the file. One result is that
the progress meter when getting the file will seem to get stuck
at 100%. Another is that the remote's preferred content expression,
if it tries to match against file size, will treat it as an empty file.
I don't see a way to improve the latter behavior, and the former behavior
is a minor enough problem.
This commit was sponsored by Jake Vosloo on Patreon.
Keys stored on the filesystem are mangled by keyFile to avoid problem
chars. So, that mangling has to be reversed when parsing files from a
borg backup back to a key.
The directory special remote also so mangles them. Some other special
remotes do not; eg S3 just serializes the key -- but S3 object names are
not limited to filesystem valid filenames anyway, so a S3 server must
not map them directly to files in any case. It seems unlikely that a
borg backup of some such special remote will get broken by this change.
This commit was sponsored by Graham Spencer on Patreon.
New error message:
Remote foo not usable by git-annex; setting annex-ignore
http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1
If git config parse fails, or the git config file is not available at the url,
a better error message for that is also shown.
This commit was sponsored by Mark Reidenbach on Patreon.
It changed parseOnly in the ByteString.Lazy module to take a lazy, not
strict ByteString. In all these cases though, we actually had a strict
ByteString, so the most efficient fix, which also happens to avoid needing
ifdefs, is to use the non-lazy module instead.
This commit was sponsored by Denis Dzyubenko on Patreon.
Seems that hasOrigin was never finding origin's git-annex branch, so a new
one got created each time. And so then it later needed to merge the two
branches, which is expensive.
Added --no-track to git branch to avoid it displaying a message about
setting up tracking branches. Of course there's no reason to make the
git-annex branch a tracking branch since git-annex auto-merges it.
Can beet to false to avoid some expensive things needed to support unlocked
files.
See my comment for why this only controls what init sets up, and not other
behavior.
I didn't bother with making the v5 upgrade code path look at this, though
it easily could, because the docs say to run git-annex init after setting
it to make it take effect.
I don't think this was really intentional behavior. It may be that it was
useful to include it so it could be passed to rmurl, since without it rmurl
would not actually remove the url. Since that was changed earlier today,
now seems like a good time to clean up the display of these urls.
This commit was sponsored by Jochen Bartl on Patreon.
fsck: When --from is used in combination with --all or similar options, do
not verify required content, which can't be checked properly when operating
on keys.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
box.com already had a special case, since its renaming was known buggy.
In its case, renaming to the temp file succeeds, but then renaming the temp
file to final destination fails.
Then this 4shared server has buggy handling of renames across directories.
While already worked around with for the temp files when storing exports
now being in the same directory as the final filename, that also affected
renameExport when the file moves between directories.
I'm not entirely clear what happens on the 4shared server when it fails
this way. It kind of looks like it may rename the file to destination and
then still fail.
To handle both, when rename fails, delete both the source and the
destination, and fall back to uploading the content again. In the box.com
case, the temp file is the source, and deleting it makes sure the temp file
gets cleaned up. In the 4shared case, the file may have been renamed to the
destination and so cleaning that up avoids any interference with the
re-upload to the destination.
unregisterurl: Fix a bug that caused an url to not be unregistered when it
is claimed by a special remote other than the web.
See commit f175d4cc90 for rationalle.
* rmurl: When youtube-dl was used for an url, it no longer needs to be
prefixed with "yt:" in order to be removed.
* rmurl: If an url is both used by the web and also claimed by another
special remote, fix a bug that caused the url to to not be removed.
The youtube-dl change is a consequence of how the bug fix is implemented.
But I also think it's the right thing to do. Consider that, before,
git-annex addurl $url followed by git-annex rmurl $url would not remove the
url in the case where youtube-dl was used. That was surprising behavior.
In the unlikely case where a special remote claims an url, and it's been
added using OtherDownloader, but it was also added already as a web url,
it seems better for rmurl to remove both than to arbitrarily remove only one.
And in the case the bug report was filed for, when an url was added as a
web url, but a special remote now claims it, that should not prevent rmurl
removing the web url.
Calling setUrlMissing lets other callers of it behave differently.
Probably the calls to it in eg, Remote.External and Remote.BitTorrent are
fine, since they don't mangle the url and just remove what was provided,
and the OtherDownloader form of a bittorrent url, respectively.
I suspect unregisterurl needs to have a similar change made to rmurl, for
similar reasons.
When autoenabling special remotes of type S3, weddav, or glacier, do not
take login credentials from environment variables, as the user may not be
expecting the autoenable to happen, and may have those set for other
purposes.
Like import was using ActionItemWorkTreeFile, it's ok to use it for export,
even though it might not correspond with a file in the work tree.
And renamed it to ActionItemTreeFile to make that clearer.
Note that when an export has to rename files, it still uses
ActionItemOther, so file will still be null in that case, but as no file is
being transferred, that seems ok.
import: When the previously exported tree contained a submodule,
preserve it in the imported tree so it does not get deleted.
The export exclude log, which was used for non-preferred content,
now also includes the submodules. Since the log format is git ls-tree
output, this does not break backwards compatibility.
This mostly affects OSX and (possibly) Windows, but the Windows
installer does not bundle git. The linux standalone builds are not
updated yet pending debian stable getting a backport of the security
fix, but the security hole is unlikely to affect linux as
case-insensitive filesystems that support symlinks are a rarity on it.
Using the linux standalone build on windows via WSL is another way it
could be affected.
This commit was sponsored by Brett Eisenberg on Patreon.
Which access a remote using rsync over ssh, and which git pushes to much
more efficiently than ssh urls.
There was some old partial support for rsync URIs from 2013, but it seemed
incomplete, and did not use rsync over ssh. Weird.
I'm not sure if there's any remaining benefit to using the non-rsync url
forms with gcrypt, now that this is implemented? Updated docs to encourage
using the rsync urls.
This commit was sponsored by Svenne Krap on Patreon.
Git.Remote.parseRemoteLocation had a hack to handle URIs that contained
characters like spaces, which is something git unfortunately allows
despite not being a valid URI. However, that hack looked for "//" to
guess something was an URI, and these gcrypt URIs, being to a local
path, don't contain that. So instead escape all illegal characters and
check if the resulting thing is an URI.
And that was already done by Git.Construct.fromUrl, so
internally the gcrypt URI with a space looks like "gcrypt::foo%20bar"
and that needs to be de-escaped when converting back from URI to local
repo path.
This change might also allow a few other almost-valid URIs to be handled
as URIs by git-annex. None that contain "//" will change, and any
behavior change should result in git-annex doing closer to a right thing
than it did before, probably.
This commit was sponsored by Noam Kremen on Patreon.
Previously such nonsensical combinations always treated the matching option
as if it didn't match.
For now, made find --branch refuse matching options that need a
filename, because one is not provided to them in a way they'll use.
There's an open bug report to support it, but making it error out is
better than the old behavior of not finding what it was asked to.
Also, made --mimetype combined with eg --all work, by looking at the
object file when operating on keys.
Implemented by generalizing registerurl. Without the implicit batch mode
of registerurl since that is only a backwards compatability thing
(see commit 1d1054faa6).
unannex, uninit: When an annexed file is modified, don't overwrite the
modified version with an older version from the annex
This commit was sponsored by Mark Reidenbach on Patreon.
This benchmarks only slightly faster than the old git-annex. Eg, for a 1
gb file, 14.56s vs 15.57s. (On a ram disk; there would certianly be
more of an effect if the file was written to disk and didn't stay in
cache.)
Commenting out the updateIncremental calls make the same run in 6.31s.
May be that overhead in the implementation, other than the actual
checksumming, is slowing it down. Eg, MVar access.
(I also tried using 10x larger chunks, which did not change the speed.)
Changing to the P2P protocol broke this, because preseedTmp copies
the local copy of the object to the temp file, and then the P2P transfer
sees the right length file and uses it as-is.
When git-annex-shell is too old and rsync is used, it did verify the
content, and when the local repo does not have the object it did verify the
content.