usleep_range is a finer precision implementations of msleep
and is designed to be a drop-in replacement for udelay where
a precise sleep / busy-wait is unnecessary.
Since an easy interface to hrtimers could lead to an undesired
proliferation of interrupts, we provide only a "range" API,
forcing the caller to think about an acceptable tolerance on
both ends and hopefully avoiding introducing another interrupt.
INTRO
As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
precise enough for many drivers (yes, sleep precision is an unfair notion,
but consistently sleeping for ~an order of magnitude greater than requested
is worth fixing). This patch adds a usleep API so that udelay does not have
to be used. Obviously not every udelay can be replaced (those in atomic
contexts or being used for simple bitbanging come to mind), but there are
many, many examples of
mydriver_write(...)
/* Wait for hardware to latch */
udelay(100)
in various drivers where a busy-wait loop is neither beneficial nor
necessary, but msleep simply does not provide enough precision and people
are using a busy-wait loop instead.
CONCERNS FROM THE RFC
Why is udelay a problem / necessary? Most callers of udelay are in device/
driver initialization code, which is serial...
As I see it, there is only benefit to sleeping over a delay; the
notion of "refactoring" areas that use udelay was presented, but
I see usleep as the refactoring. Consider i2c, if the bus is busy,
you need to wait a bit (say 100us) before trying again, your
current options are:
* udelay(100)
* msleep(1) <-- As noted above, actually as high as ~20ms
on some platforms, so not really an option
* Manually set up an hrtimer to try again in 100us (which
is what usleep does anyway...)
People choose the udelay route because it is EASY; we need to
provide a better easy route.
Device / driver / boot code is *currently* serial, but every few
months someone makes noise about parallelizing boot, and IMHO, a
little forward-thinking now is one less thing to worry about
if/when that ever happens
udelay's could be preempted
Sure, but if udelay plans on looping 1000 times, and it gets
preempted on loop 200, whenever it's scheduled again, it is
going to do the next 800 loops.
Is the interruptible case needed?
Probably not, but I see usleep as a very logical parallel to msleep,
so it made sense to include the "full" API. Processors are getting
faster (albeit not as quickly as they are becoming more parallel),
so if someone wanted to be interruptible for a few usecs, why not
let them? If this is a contentious point, I'm happy to remove it.
OTHER THOUGHTS
I believe there is also value in exposing the usleep_range option; it gives
the scheduler a lot more flexibility and allows the programmer to express
his intent much more clearly; it's something I would hope future driver
writers will take advantage of.
To get the results in the NUMBERS section below, I literally s/udelay/usleep
the kernel tree; I had to go in and undo the changes to the USB drivers, but
everything else booted successfully; I find that extremely telling in and
of itself -- many people are using a delay API where a sleep will suit them
just fine.
SOME ATTEMPTS AT NUMBERS
It turns out that calculating quantifiable benefit on this is challenging,
so instead I will simply present the current state of things, and I hope
this to be sufficient:
How many udelay calls are there in 2.6.35-rc5?
udealy(ARG) >= | COUNT
1000 | 319
500 | 414
100 | 1146
20 | 1832
I am working on Android, so that is my focus for this. The following table
is a modified usleep that simply printk's the amount of time requested to
sleep; these tests were run on a kernel with udelay >= 20 --> usleep
"boot" is power-on to lock screen
"power collapse" is when the power button is pushed and the device suspends
"resume" is when the power button is pushed and the lock screen is displayed
(no touchscreen events or anything, just turning on the display)
"use device" is from the unlock swipe to clicking around a bit; there is no
sd card in this phone, so fail loading music, video, camera
ACTION | TOTAL NUMBER OF USLEEP CALLS | NET TIME (us)
boot | 22 | 1250
power-collapse | 9 | 1200
resume | 5 | 500
use device | 59 | 7700
The most interesting category to me is the "use device" field; 7700us of
busy-wait time that could be put towards better responsiveness, or at the
least less power usage.
Signed-off-by: Patrick Pannuto <ppannuto@codeaurora.org>
Cc: apw@canonical.com
Cc: corbet@lwn.net
Cc: arjan@linux.intel.com
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Kgdb uses brki r16, 0x18 instruction to call
low level _debug_exception function which save
current state to pt_regs and call microblaze_kgdb_break
function. _debug_exception should be called only from
the kernel space. User space calling is not supported
because user application debugging uses different handling.
pt_regs_to_gdb_regs loads additional special registers
which can't be changed
* Enable KGDB in Kconfig
* Remove ancient not-tested KGDB support
* Remove ancient _debug_exception code from entry.S
Only MMU KGDB support is supported.
Signed-off-by: Michal Simek <monstr@monstr.eu>
CC: Jason Wessel <jason.wessel@windriver.com>
CC: John Williams <john.williams@petalogix.com>
CC: Edgar E. Iglesias <edgar.iglesias@petalogix.com>
CC: linux-kernel@vger.kernel.org
Acked-by: Jason Wessel <jason.wessel@windriver.com>
This is the first patch which add support for
user application debugging through brki rX, 0x18 vector.
This patch has side effect which also remove security issue
to use brki rX, 0x18 to freeze kernel.
Support for old gdb support via priviledged exception
(brk r0, r0) is still there. It will be remove in future.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Syscall can be called only from userspace that's why
we don't need to check which space kernel come from.
Kernel syscall calling is not check and shouldn't come
throught this part of code.
Signed-off-by: Michal Simek <monstr@monstr.eu>
We are not working with values from MSR that's why
we can discard it and use r11 for different purpose without
saving/restoring.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Remove set_vms because UMS is cleared and VMS is already setup.
Optimize function calling which save one additional instruction.
Signed-off-by: Michal Simek <monstr@monstr.eu>
VMS is always setup because VM mode was before
exception/syscall/interrupt. Kernel continues in kernel mode
that's why we have to clear UMS bit if kernel comes from
user space.
Signed-off-by: Michal Simek <monstr@monstr.eu>
PT_MODE stores information if kernel comes from user
or kernel space. If come from user space, PT_MODE
contains 0. If come from kernel store, PT_MODE contains
non zero value. We don't need to save value 1. I am using
r1 register which contains non zero value.
This change save one additional instruction.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Keep together all arguments for send_sig function.
Move returning address to delay slot which is executed.
Remove additional send_sig loading. I am using IMM part of
rtbd instruction with r0.
old solution:
addik r11, r0, send_sig
rtbd r11, 0
nop
new solution:
rtbd r0, send_sig
nop
There is one instruction saving.
Signed-off-by: Michal Simek <monstr@monstr.eu>
It is necessary to setup BIP and EE and clear EIP
only for unaligned exception handler. The rest of
hw exception handlers don't require it.
HW exception occured and we are not in virtual mode.
That's why we can do operations protected by EIP.
Interrupt, next hw exception or syscall can't occur.
EIP is cleared by rted.
This change speedup page_fault hw exception handler
which is critical path.
There is also necessary to save R11 content before
flag setup for unaligned exception.
Signed-off-by: Michal Simek <monstr@monstr.eu>
SAVE_STATE macro is used in hw exceptions high level handling
functions. Hw exception doesn't disable IRQ that's why we don't
need to reenable it.
Signed-off-by: Michal Simek <monstr@monstr.eu>
We don't need to protect by BIP whole ret_from_trap/ret_from_exc code.
Only restoring from user/hw exception should be covered.
If BIP is setup, IRQ can't occur.
Signed-off-by: Michal Simek <monstr@monstr.eu>
There is a way howto remove Kernel Mode variable. It is easier
to parse UMS bit in MSR to find out if I come from kernel or user
space. Loading MSR content should be in one cycle and loading
PER_CPU variable depends on memory state.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Save and restore R3/R4 registers in macros. This change
help to cleanup entry.S.
In ret_from_trap function we are saving returning value from
syscall to pt_regs on stack that's why we don't need to save and
restore these values before kernel functions (schedule, do_signal).
Signed-off-by: Michal Simek <monstr@monstr.eu>
_start symbol stores physical address where kernel is.
Gdb uses this symbol for their purpose that's why
we have to rename it.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Lower shifting values ensure that shifted 32bit counter
value doesn't exceed 64bit cycle variable too fast.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Microblaze has support for early printk. The second serial
driver (uart16550/8250) has no microblaze support for early
printk.
Signed-off-by: Michal Simek <monstr@monstr.eu>
HAVE_ARCH_PCI_SET_DMA_MASK was removed in 2.6.34 (no architecture has
the own implementation of pci_set_dma_mask).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Implement intelligent backtracing by searching for stack frame creation,
and emitting only return addresses. Use print_hex_dump() to display the
entire binary kernel stack.
Limitation: MMU kernels are not currently able to trace beyond a system trap
(interrupt, syscall, etc.). It is the intent of this patch to provide
infrastructure that can be extended to add this capability later.
Changes from V1:
* Removed checks in find_frame_creation() that prevented location of the frame
creation instruction in heavily optimized code
* Various formatting/commenting/file location tweaks per review comments
* Dropped Kconfig option to enable STACKTRACE as something logically separate
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Allow developer to configure memory page size at compile time.
Larger pages can improve performance on some workloads.
Based on PowerPC code.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
sys_clone syscall ignored args which this patch mapped to args
which are passing from glibc.
Here is the origin problem description.
"I ran the static libgcc tests (very few of them are there, they are
mostly dynamically linked) and some of them fail with an assertion in
fork() system call (tid != pid), I looked at the microblaze/entry.S
file and it looks suspicious (ignores arguments 3-5)"
Arg mapping should be:
glibc ARCH_FORK(...) -> do_fork(...)
r5 -> r5 (clone_flags)
r6 -> r6 (stack_start, use parent->stack if NULL)
pt_regs -> r7 (pt_regs)
r7 -> r8 (stack_size)
r8 -> r9 (parent_tidptr)
r9 -> r10 (child_tidptr)
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Microblaze doesn't support frame pointers. Ftrace code
uses CALLER_ADDR1 which is defined in linux/ftrace.h. For Microblaze
is 0.
Signed-off-by: Michal Simek <monstr@monstr.eu>
copy_to_user_page macro is used in mm/memory.c:access_process_vm
function. This function is called from ptrace code (POKETEXT, POKEDATA)
which write data to memory. Microblaze handle physical address for
caches that's why there is virt_to_phys conversion.
There is potential one location which can caused the problem on WB system.
The important is take a look at write PTRACEs requests
(POKE/TEXT, DATA, USR).
Note:
Majority of Microblaze PTRACE code is moved to generic location
in newer kernel version that's why this solution should work on
the newest kernel version too.
linux/io.h is in cacheflush because of mm/nommu.c
Tested on a WB system - hello world debugging.
Signed-off-by: Michal Simek <monstr@monstr.eu>
The label should be remove by
21e1c93631
Warning message:
arch/microblaze/mm/fault.c: In function 'do_page_fault':
arch/microblaze/mm/fault.c:229: warning: label 'survive' defined but not used
Signed-off-by: Michal Simek <monstr@monstr.eu>