giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)
error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.
Changed a lot of existing error to giveup when it is not strictly an
internal error.
Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.
Sponsored-by: Noam Kremen on Patreon
This is by no means complete, but escaping filenames in actionItemDesc does
cover most commands.
Note that for ActionItemBranchFilePath, the value is branch:file, and I
choose to only quote the file part (if necessary). I considered quoting the
whole thing. But, branch names cannot contain control characters, and while
they can contain unicode, git coes not quote unicode when displaying branch
names. So, it would be surprising for git-annex to quote unicode in a
branch name.
The find command is the most obvious command that still needs to be
dealt with. There are probably other places that filenames also get
displayed, eg embedded in error messages.
Some other commands use ActionItemOther with a filename, I think that
ActionItemOther should either be pre-sanitized, or should explicitly not
be used for filenames, so that needs more work.
When --json is used, unicode does not get escaped, but control
characters were already escaped in json.
(Key escaping may turn out to be needed, but I'm ignoring that for now.)
Sponsored-by: unqueued on Patreon
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.
Noisy due to some refactoring into Types/
Currently have three old versions of functions that more reworking is
needed to remove: getDaemonStatusOld, modifyDaemonStatusOld_, and
modifyDaemonStatusOld
Converted several threads to run in the monad.
Added a lot of useful combinators for working with the monad.
Now the monad includes the name of the thread.
Some debugging messages are disabled pending converting other threads.
This is handled differently for inotify, which can track modifications of
existing files, and kqueue, which cannot (TTBOMK). On the inotify side,
the TransferWatcher just waits for the file to be updated and reads the new
bytesComplete. On the kqueue side, the TransferPoller has to re-read the
file every update (currently 0.5 seconds, might need to increase that).
I did think about working around kqueue's limitations by somehow creating
a new file each time the size changed. But cleaning up all the files that
would result seemed difficult. And really, this is not a lot worse than
the TransferWatcher's behavior for downloads, which stats a file every 0.5
seconds. As long as the OS has decent file caching behavior..
This ensures file propigate takes place in situations such as: Usb drive A
is connected to B. A's master branch is already in sync with B, but it is
being used to sneakernet some files around, so B downloads those. There is no
master branch change, so C does not request these files. B needs to upload
the files it just downloaded on to C, etc.
My first try at this, I saw loops happen. B uploaded to C, which then
tried to upload back to B (because it had not received the updated
git-annex branch from B yet). B already had the file, but it still created
a transfer info file from the incoming transfer, and its watcher saw
that be removed, and tried to upload back to C.
These loops should have been fixed by my previous commit. (They never
affected ssh remotes, only local ones, it seemed.) While C might still try
to upload to B, or to some other remote that already has the file, the
extra work dies out there.
This seems to work pretty well.
Handled the process groups like this:
- git-annex processes started by the assistant for transfers are run in their
own process groups.
- otherwise, rely on the shell to allocate a process group for git-annex
There is potentially a problem if some other program runs git-annex
directly (not using sh -c) The program and git-annex would then be in
the same process group. If that git-annex starts a transfer and it's
canceled, the program would also get killed. May or may not be a desired
result.
Also, the new updateTransferInfo probably closes a race where it was
possible for the thread id to not be recorded in the transfer info, if
the transfer info file from the transfer process is read first.
I've convinced myself that nothing in DaemonStatus can deadlock,
as it always keepts the TMVar full. That was the only reason it was in the
Annex monad.
This should fix OSX/BSD issues with not noticing transfer information
files with kqueue. Now that threads are used, the thread can manage the
transfer slot allocation and deallocation by itself; much cleaner.
There's still a bug; if the child updates its transfer info file,
then the data from it will superscede the TransferInfo, losing the
info that we should wait on this child.