When a command is operating on multiple files and there's an error with
one, try harder to continue to the rest. (As was already done for many
types of errors including IO errors.)
This handles cases like lockContentForRemoval throwing an exception when
the content is already locked. Just because a drop of one file fails, does
not mean it shouldn't go on to try to drop other files.
I looked over uses of `giveup` in Command/*; there are too many to check
them all extensively, but none stood out as being problems that should let
one commandAction stop running other commandActions. Worst case, something
bad will happen and rather than stopping right away with an error,
git-annex will display multiple errors as it fails over and over on each
file. I don't think I ever really intended `error`/`giveup` to stop other
commandActions; this was a relic of old confusion over haskell exception
handling.
Test suite passes.
This commit was sponsored by Ethan Aubin.
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.
This commit was sponsored by Trenton Cronholm on Patreon.
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Fix bash completion of "git annex" to propertly handle files with spaces
and other problem characters. (Completion of "git-annex" already did.)
This commit was sponsored by Jake Vosloo on Patreon.
Finishes the start made in 983c9d5a53, by
handling the case where `transfer` fails for some other reason, and so the
ReadContent callback does not get run. I don't know of a case where
`transfer` does fail other than the locking dealt with in that commit, but
it's good to have a guarantee.
StoreContent and StoreContentTo had a similar problem.
Things like `getViaTmp` may decide not to run the transfer action.
And `transfer` could certianly fail, if another transfer of the same
object was in progress. (Or a different object when annex.pidlock is set.)
If the transfer action was not run, the content of the object would
not all get consumed, and so would get interpreted as protocol commands,
which would not go well.
My approach to fixing all of these things is to set a TVar only
once all the data in the transfer is known to have been read/written.
This way the internals of `transfer`, `getViaTmp` etc don't matter.
So in ReadContent, it checks if the transfer completed.
If not, as long as it didn't throw an exception, send empty and Invalid
data to the callback. On an exception the state of the protocol is unknown
so it has to raise ProtoFailureException and close the connection,
same as before.
In StoreContent, if the transfer did not complete
some portion of the DATA has been read, so the protocol is in an unknown
state and it has to close the conection as well.
(The ProtoFailureMessage used here matches the one in Annex.Transfer, which
is the most likely reason. Not ideal to duplicate it..)
StoreContent did not ever close the protocol connection before. So this is
a protocol change, but only in an exceptional circumstance, and it's not
going to break anything, because clients already need to deal with the
connection breaking at any point.
The way this new behavior looks (here origin has annex.pidlock = true so will
only accept one upload to it at a time):
git annex copy --to origin -J2
copy x (to origin...) ok
copy y (to origin...)
Lost connection (fd:25: hGetChar: end of file)
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Fix hang when transferring the same objects to two different clients at the
same time. (Or when annex.pidlock is used, two different objects to the
same or different clients.)
Could also potentially occur if a client was downloading an object and
somehow lost connection but that git-annex-shell was still running and
holding the transfer lock.
This does not guarantee that, if `transfer` fails for some other reason,
a DATA response will be made.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
I suspect this may be due to SQLITE_IOERR_SHORT_READ, but have not
verified.
I was able to reproduce it on Linux after running the test suite in a loop
for 1-3 hours until it failed.
The WAL mode entry change in 3963c5fcf5
may have hidden the problem I was seeing; I have not seen an ErrorIO
since then.
This is safe, because while the annex object ends up executable,
there were already at least two other cases where it ended up executable:
1. git add an an executable file
2. chmod +x of a a non-executable worktree file that was hard linked to the
annex object
After copy/hard link, it always fixes up the permissions to match the mode
of the worktree file, so when an executable annex object gets hard linked
to a non-executable worktree file, its execute bit gets removed.
Commit b7c8bf5274 already *said* it would do
this; I suspect the line of code I've removed was included in that commit
accidentially.
Also improves annex.thin documentation.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
It was trying to git annex adjust when in a direct mode repo, and that
of course fails. What I don't understand though, is how the test suite
managed to work before, when it was clearly checking the wrong thing.
Since the right way to fix it was obvious, I have not bisected.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Block other threads while the export database is being constructed (or
updated) by the first thread to try to access it.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
That could cause git-annex to get confused about whether a locked file's
content was present, when the object file got touched.
Unfortunately this means more work sometimes when annex.thin is set,
since it has to checksum the file to tell if it's still got the right
content.
Had to suppress output when inAnnex calls isUnmodified, otherwise
"(checksum...)" would be printed in places it ought not to be,
eg "git annex get" could turn out not need to get anything, and
so only display that.
This commit was sponsored by Ole-Morten Duesund on Patreon.
* rmurl: Fix a case where removing the last url left git-annex thinking
content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
used to update the presence information of the external special remote
that called them; this was not documented behavior and is no longer done.
Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.
In AddUrl, had to make it update location tracking, to handle the
non-web-url case.
This commit was sponsored by Ewen McNeill on Patreon.
* init: Improve generated post-receive hook, so it won't fail when
run on a system whose git-annex is too old to support git-annex post-receive
* init: Update the post-receive hook when re-run in an existing repository.
This commit was sponsored by Jack Hill on Patreon.
This reverts commit b18fb1e343.
That broke support for old git-annex-shell before p2pstdio was added.
The immediate problem is that postAuth had a fallthrough case
that sent an error back to the peer, but sending an error back when the
connection is closed is surely not going to work.
But thinking about it some more, making every function that uses receiveMessage
need to handle ProtocolEOF adds a lot of complication, so I don't want
to do that.
The commit only cleaned up the test suite output a tiny bit, so I'm just
gonna revert it for now.
download is documented as displaying an error when download fails, but
it didn't when the url was not valid at all. That leads to confusing
behavior.
Also, display the url with --debug
Untested, on FreeBSD but enough to fix the listed build errors.
Seems that System.Posix.Files must have used to export this stuff and it
was split.
This commit was sponsored by Peter on Patreon.
Work around git cat-file --batch's protocol not supporting newlines by
running git cat-file not batched and passing the filename as a
parameter.
Of course this is quite a lot less efficient, especially because it
currently runs it multiple times to query for different pieces of
information.
Also, it has subtly different behavior when the batch process was
started and then some changes were made, in which case the batch process
sees the old index but this workaround sees the current index. Since
that batch behavior is mostly a problem that affects the assistant and has
to be worked around in it, I think I can get away with this difference.
I don't know of any other problems with newlines in filenames, everything
else in git I can think of supports -z. And git-annex's json output
supports newlines in filenames so downstream parsers from git-annex will be ok.
git-annex commands that use --batch themselves don't support newlines
in input filenames; using --json --batch is currently a way around that
problem.
This commit was sponsored by Ewen McNeill on Patreon.
v6: When a file is unlocked but has not been modified, and the unlocking is
only staged, git-annex add did not lock it. Now it will, for consistency
with how modified files are handled and with v5.
Note the removal of the sameInodeCache check. Otherwise it would see
that the unmodified file is unmodified and stop there. That check seems to have
been copied from the direct mode branch. But, direct mode had a specific
reason to check for unmodified content, that does not apply to v6.
The second pass means there is potential for a race, eg the unlocked
file could be modified in between the first and second passes.
No problem with that, since both passes do the same thing.
This commit was sponsored by Jake Vosloo on Patreon.
* Don't use GIT_PREFIX when GIT_WORK_TREE=. because it seems git
does not intend GIT_WORK_TREE to be relative to GIT_PREFIX in that
case, despite GIT_WORK_TREE=.. being relative to GIT_PREFIX.
* Don't use GIT_PREFIX to fix up a relative GIT_DIR, because
git 2.11 sets GIT_PREFIX set to a path it's not relative to.
and apparently GIT_DIR is never relative to GIT_PREFIX.
Commit e50ed4ba48 led us down this path
by working around a git bug by relying on the barely documented GIT_PREFIX.
This commit was sponsored by Trenton Cronholm on Patreon.
v6: Fix annex object file permissions when git-annex add is run on a
modified unlocked file, and in some related cases.
If a hard link is made, don't freeze it; annex.thin
uses writable object files.
Also: For some reason, linkToAnnex used to thawContent src. I can see no
reason why it needed to do that, so I eliminated that.
This commit was sponsored by Brock Spratlen on Patreon.
Probably not noticed until now because the queue is large enough that two
threads each filling theirs at the same time and flushing is unlikely to
happen.
Also made explicit that each worker thread gets its own queue.
I think that was the case before, but if something was put in the queue
before worker threads were forked off, they could have each inherited the
same queue.
Could have gone with a single shared queue, but per-worker queues is more
efficient, because a worker can add lots of stuff to its own queue without
any locking.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Avoids annex.largefiles inconsitency and also avoids a lot of
unneccessary calls to the clean filter when a large repo's clone
is being initialized.
This commit was supported by the NSF-funded DataLad project.
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.
Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.
Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.
This commit was supported by the NSF-funded DataLad project.
When --batch is used with matching options like --in, --metadata, etc, only
operate on the provided files when they match those options. Otherwise, a
blank line is output in the batch protocol.
Affected commands: find, add, whereis, drop, copy, move, get
In the case of find, the documentation for --batch already said it honored
the matching options. The docs for the rest didn't, but it makes sense to
have them honor them. While this is a behavior change, why specify the
matching options with --batch if you didn't want them to apply?
Note that the batch output for all of the affected commands could
already output a blank line in other cases, so batch users should
already be prepared to deal with it.
git-annex metadata didn't seem worth making support the matching options,
since all it does is output metadata or set metadata, the use cases for
using it in combination with the martching options seem small. Made it
refuse to run when they're combined, leaving open the possibility for later
support if a use case develops.
This commit was sponsored by Brett Eisenberg on Patreon.
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.
The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.
Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.
Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.
This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
Work around git bug that runs smudge/clean filters at the top of the
repository while passing them a relative GIT_WORK_TREE that may point
outside of the repository, by using GIT_PREFIX to get back to the
subdirectory where a relative GIT_WORK_TREE is valid.
git devs have been informed of the bug and may fix it, which could conveivably
break this fix, but as it is, this works back to git 1.7.6.
This commit was sponsored by Jochen Bartl on Patreon.
Send User-Agent and any configured annex.http-headers when downloading with
http, fixes reversion introduced when switching to http-client.
This commit was sponsored by mo on Patreon.
Display error messages that come from git-annex-shell when the p2p protocol
is used, so that diskreserve messages, IO errors, etc from the remote side
are visible again.
Felt like it should perhaps use outputError, so --json-error-messages would
include these, but as an async IO action, it can't, and this would need
MessageState to be converted to a tvar. Anyway, when not using p2pstdio,
that's not done; nor is it done for stderr from external special remotes
or other commands, so punted on the idea for now.
This commit was sponsored by mo on Patreon.
I can't find any documentation of how long it should be. Hard to imagine
it being shorter than 4 characters though, so put that in as a conservative
lower bound.
This commit was sponsored by Nick Piper on Patreon.
Fixed annex-checkuuid implementation, so that remotes configured that way
can be used. This was 100% broken from the first commit of it, oops.
This commit was sponsored by Øyvind Andersen Holm.
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.
The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.
The additional IO action should have no performance impact as long as
it's simply return.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
Show operating system and repository version list when run outside
a git repo too.
Also made it only display the local repository version when in a git-annex
repo. Before it showed "unknown" when run in a git repo that was not
git-annex initialized. That seemed like confusing behavior.
This commit was sponsored by Jochen Bartl on Patreon.
https://prime.haskell.org/wiki/Libraries/Proposals/SemigroupMonoid
I am not happy with the fragile pile of CPP boilerplate required to support
ghc back to 7.0, which git-annex still targets for both the android build
and the standalone build targeting old linux kernels. It makes me unlikely
to want to use Semigroup more in git-annex, because the benefit of the
abstraction is swamped by the ugliness. I actually considered ripping out
all the Semigroup instances, but some are needed to use
optparse-applicative.
The problem, I think, is they made this transaction on too fast a timeline.
(Although ironically, work on it started in 2015 or earlier!)
In particular, Debian oldstable is not out of security support, and it's
not possible to follow the simpler workarounds documented on the wiki and
have it build on oldstable (because the semigroups package in it is too
old).
I have only tested this build with ghc 8.2.2, not the newer and older
versions that branches of the CPP support. So there could be typoes, we'll
see.
This commit was sponsored by Brock Spratlen on Patreon.
* migrate: Fix bug in migration between eg SHA256 and SHA256E,
that caused the extension to be included in SHA256 keys,
and omitted from SHA256E keys.
(Bug introduced in version 6.20170214)
* migrate: Check for above bug when migrating from SHA256 to SHA256
(and same for SHA1 to SHA1 etc), and remove the extension that should
not be in the SHA256 key.
* fsck: Detect and warn when keys need an upgrade, either to fix up
from the above migrate bug, or to add missing size information
(a long ago transition), or because of a few other past key related
bugs.
This commit was sponsored by Henrik Riomar on Patreon.
Prevent haskell http-client from decompressing gzip files, so downloads of
such files works the same as it used to with wget and curl.
Explicitly setting accept-encoding to "identity" is probably not needed,
but that's what wget sends (curl does not send the header), and since
http-client is trying to be excessively smart, it seems we need to set
hAcceptEncoding to something to prevent it from inserting its own,
and this seems better than some hack like "".
This commit was sponsored by Ole-Morten Duesund on Patreon.
* move: --force was accidentially enabling two unrelated behaviors
since 6.20180427. The older behavior, which has never been well
documented and seems almost entirely useless, has been removed.
* copy: --force no longer does anything.
This commit was sponsored by Øyvind Andersen Holm.
This fixes a crash when a git submodule has a name starting with a dot.
Such a submodule might contain dotfiles that are intended to be used when
inside the view (since a dot-directory that's not a submodule was already
preserved when entering a view). So, rather than eliminating the submodule
from the view, its git ls-files --stage hash is copied over into the view.
dotfiles/dirs have their git ls-files --stage hashes similarly copied over
to the view. This is more efficient and simpler than the old method,
and also won't break if git ever adds a new type of tree item, like was
done with submodules.
Since the content of dotfiles in the working tree is no longer hashed
when entering a view, when there are unstaged modifications, they are
not included in the view branch. Entering the view branch still works,
but git checkout shows "M .dotfile", and git diff will show the unstaged
changes. This seems like an improvement over the old behavior.
Also made Command.View not delete empty directories that are submodules
when entering a view, while still deleting other empty directories.
This commit was supported by the NSF-funded DataLad project.
* Display error message when http download fails.
There's nothing in the http-client library to nicely format a http
exception, so in some cases it has to fall back to using show on it.
Seems better than just saying "it failed" or only showing the http
status code.
* Avoid forward retry when 0 bytes were received.
forwardRetry was comparing Nothing to Just 0, and so thought there had
been progress made when 0 bytes were received.
This commit was supported by the NSF-funded DataLad project.
The old git-annex Android app is now deprecated in favor of running
git-annex in termux. I suspect all or nearly all of these no longer apply.
This commit was sponsored by Jochen Bartl on Patreon.
Fix regression in last release that crashes when using --all or running
git-annex in a bare repository. May have also affected git-annex unused and
git-annex info.
Reversed the order of the (++) in Annex.Branch.files so --all will stream
lazily still when there are not a bunch of uncommitted journal files.
Added a todo to maybe improve this later.
This commit was sponsored by Trenton Cronholm on Patreon.
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
See the big comment at the bottom of Command.Drop for the full details.
(The --safe/--unsafe options were never released.)
This commit was sponsored by Jake Vosloo on Patreon.
move: Added --safe option, which makes move honor numcopies settings.
Also --unsafe enables the default behavior, anticipating that the
default may one day change.
This commit was sponsored by Ethan Aubin.
When adding a new version of a file, and annex.genmetadata is enabled,
don't copy the data metadata from the old version of the file, instead use
the mtime of the file. Rationalle being that the user has requested to
generate metadata and so would expect to get the new mtime into metadata.
Also, avoid warning about copying metadata when all the old metadata is
date metadata. Which was rather the harder part.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
The pipe's FDs got inherited by ssh and it did something that kept them
open even once it exited. Probably involving passing them on to the ssh
mux daemon.
Set close on exec, and all is well.
Kept Annex.Ssh not using processTranscript even though it no longer
hangs when it does use it, just because processTranscript is overkill
there.
This commit was supported by the NSF-funded DataLad project.
Fix race condition in ssh warmup that caused git-annex to get stuck and
never process some while when run with high levels of concurrency.
So far, I've isolated the problem to processTranscript, which hangs
reading output from ssh in this situation. I don't yet understand why
processTranscript behaves that way.
Since here we don't care about the ssh output, and only want to /dev/null
it, changed to not use processTranscript, avoiding its problem.
This commit was supported by the NSF-funded DataLad project.
Avoid creating transfer info file before transfer lock is created and
locked.
The wrong order for one thing caused transfer info to be overwritten
when a transfer was already in progress.
But worse, it caused checkTransfer to see the transfer info,
and so lock the transfer lock in order to verify the transfer was not in
progress. Which in a concurrent situation, prevented the transferrer
from locking the transfer lock, so it failed with "transfer already in
progress".
Note that the transferinfo command does not lock the transfer lock
before creating the transfer info. But, that's only run after
recvkey is running, and recvkey does lock the transfer lock, so that
seems more or less ok. (Other than being a super complicated legacy mess
that the P2P code has mostly obsoleted now.)
This commit was supported by the NSF-funded DataLad project.
Do not treat parts of the filename that contain punctuation or other
non-alphanumeric characters as extensions. Before, such characters were
filtered out.
Note that in 45308ec78b "foo.ba__________r"
was munged to ".bar" and so incorrectly treated as an extension. That was
fixed by changing the filter order, but not allowing punctuation seems a
better fix.
This assumes that extensions containing punctuation are rare. "_" seems the
most likely character; I used it in ikiwiki "._comment" files. But I can't
recall seeing it anywhere else. It certianly seems that no commonly used
extensions contain punctuation. If git-annex doesn't treat "._comment"
as an extension, it's not likely to break software that expects to see that
extension like some software expects to see .epub or .mp3.
This commit was sponsored by Jack Hill on Patreon.
sync: Fix bug that prevented pulling changes into direct mode repositories
that were committed to remotes using git commit rather than git-annex sync.
This commit was supported by the NSF-funded DataLad project.
tips/automatically_adding_metadata/pre-commit-annex: Fix to not silently
skip filenames containing non-ascii characters.
git diff-index defaults to munging non-ascii characters. Using -z makes
it not do that, and then we just change the nulls to newlines.
This commit was sponsored by Jochen Bartl on Patreon.
Added annex.merge-annex-branches config setting which can be used to
disable automatic merge of git-annex branches.
I wonder if git-annex merge/sync/assistant should disable this
setting? Not sure yet, so have not done so. May be that users will not set
it in git config, but pass it via -c to commands that need it.
Checking the config setting adds a very small overhead, but it's
only checked once per command so should be insignificant.
This commit was supported by the NSF-funded DataLad project.
Repositories that are upgraded from before that version to this
one will not break, but will just not see the benefit of the mergedrefs log
speeding things up, until one new ref gets merged in.
Added remote.<name>.annex-checkuuid config, which can be set to false to
disable the default checking of the uuid of remotes that point to
directories. This can be useful to avoid unncessary drive spin-ups and
automounting.
Note that the UUID check is still done before writing to the repository,
to avoid writing to the wrong repository if it got relocated. Check is
also done before checkPresent to avoid getting confused about what is in
which repo. This is effectively the same as the use of git-annex-shell
with a uuid to check that the remote repository is the expected one.
Did not bother with the check for retrieveKeyFile because it doesn't
matter if the wrong repo is used then.
This commit was sponsored by Trenton Cronholm on Patreon.
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.
Got rid of the remotes member of Git.Repo. This was a bit painful.
Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.
This commit was sponsored by Jake Vosloo on Patreon.
Fourth or fifth try at this and finally found a way to make it work.
Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.
Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
addurl: When the file youtube-dl will download is already an annexed file,
don't download it again and fail to overwrite it, instead just do nothing,
like it used to when quvi was used.
This commit was sponsored by Anthony DeRobertis on Patreon.
This reverts commit 51228c2306.
No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.
With cabal, System.Posix.Process and System.Posix.Env are both missing.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.
This commit was sponsored by Fernando Jimenez on Patreon.
lookupkey: Support being given an absolute filename to a file within the
current git repository.
This commit was supported by the NSF-funded DataLad project.
Similar to c6e4bc0a22 but another code
path. As well as using youtube-dl unecessarily, it used the filename it
comes up with, which while nice for youtube videos, is not right for
other files.
This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.
This commit was supported by the NSF-funded DataLad project.
Now youtubeDlCheck downloads the beginning of the url's content and
checks if it's html, only when it is does it pass it off the youtube-dl
to check if it supports it.
This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.
As well as the reported bug, this also fixes behavior when an url
was added with youtube-dl, but the url content has now changed from
a html page to something else. Remote.Web.checkKey used to wrongly
succeed in that situation, since youtube-dl said sure it can download
that something else.
This commit was supported by the NSF-funded DataLad project.
initremote, enableremote: Really support gpg subkeys suffixed with an
exclamation mark, which forces gpg to use a specific subkey. (Previous try
had a bug.)
This commit was sponsored by Jake Vosloo on Patreon.
When there are multiple urls for a file, still treat it as being present
in the web when some urls don't work, as long as at least one url does
work.
This is consistent with the other web methods handling of multiple urls.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Actual problem is the keyName was set to "Ref \"sha\"", which led to
this follow-on failure since it contained a space.
The bad data would also get into the export database when exporting to a
non-external special remote. Looking briefly at that, I don't think the bad
data will lead to anything more than a re-upload of the file content
now that the problem has been fixed.
This commit was sponsored by Peter Hogg on Patreon.
Windows: Fix reversion that caused the path used to link to annexed
content include the drive letter and full path, rather than being
relative. (`git annex fix` will fix up after this problem).
I've not identified the commit that brought the reversion (probably it
happened this spring when I was removing MisingH and last touched
Utility.Path). Likely commit 18b9a4b8024115db67ae309fdaf54e1553037529?
The problem is that relPathDirToFile got called two paths that had the
slashes different ways around. Since takeDrive includes the first slash,
this made two paths on the same drive seem different and it bailed.
(ifdefs around this to avoid doing extra work on non-windows)
This commit was sponsored by Jack Hill on Patreon.
This avoids all the complication about redundant work discussed in
the previous try at fixing this. At the expense of needing each command
that could have the problem to be patched to simply wrap the action in
onlyActionOn once the key is known. But there do not seem to be many
such commands.
onlyActionOn' should not be used with a CommandStart (or CommandPerform),
although the types do allow it. onlyActionOn handles running the whole
CommandStart chain. I couldn't immediately see a way to avoid mistken
use of onlyActionOn'.
This commit was supported by the NSF-funded DataLad project.
After a false start, I found a fairly non-intrusive way to deal with it.
Although it only handles transfers -- there may be issues with eg
concurrent dropping of the same key, or other operations.
There is no added overhead when -J is not used, other than an added
inAnnex check. When -J is used, it has to maintain and check a small
Set, which should be negligible overhead.
It could output some message saying that the transfer is being done by
another thread. Or it could even display the same progress info for both
files that are being downloaded since they have the same content. But I
opted to keep it simple, since this is rather an edge case, so it just
doesn't say anything about the transfer of the file until the other
thread finishes.
Since the deferred transfer action still runs, actions that do more than
transfer content will still get a chance to do their other work. (An
example of something that needs to do such other work is P2P.Annex,
where the download always needs to receive the content from the peer.)
And, if the first thread fails to complete a transfer, the second thread
can resume it.
But, this unfortunately means that there's a risk of redundant work
being done to transfer a key that just got transferred.
That's not ideal, but should never cause breakage; the same
thing can occur when running two separate git-annex processes.
The get/move/copy/mirror --from commands had extra inAnnex checks added,
inside the download actions. Without those checks, the first thread
downloaded the content, and then the second thread woke up and
downloaded the same content redundantly.
move/copy/mirror --to is left doing redundant uploads for now. It
would need a second checkPresent of the remote inside the upload
to avoid them, which would be expensive. A better way to avoid
redundant work needs to be found..
This commit was supported by the NSF-funded DataLad project.
git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.
Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.
This commit was supported by the NSF-funded DataLad project.
Before, there was a window where interrupting an add could result in the
file being moved into the annex, with no symlink yet created.
This commit was supported by the NSF-funded DataLad project.