kmmio.c handles the list of mmio probes with callbacks, list of traced
pages, and attaching into the page fault handler and die notifier. It
arms, traps and disarms the given pages, this is the core of mmiotrace.
mmio-mod.c is a user interface, hooking into ioremap functions and
registering the mmio probes. It also decodes the required information
from trapped mmio accesses via the pre and post callbacks in each probe.
Currently, hooking into ioremap functions works by redefining the symbols
of the target (binary) kernel module, so that it calls the traced
versions of the functions.
The most notable changes done since the last discussion are:
- kmmio.c is a built-in, not part of the module
- direct call from fault.c to kmmio.c, removing all dynamic hooks
- prepare for unregistering probes at any time
- make kmmio re-initializable and accessible to more than one user
- rewrite kmmio locking to remove all spinlocks from page fault path
Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe()
or is there a better way?
The function called via call_rcu() itself calls call_rcu() again,
will this work or break? There I need a second grace period for RCU
after the first grace period for page faults.
Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack
that next. At some point I will start looking into how to make mmiotrace
a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should
make the user space part of mmiotracing as simple as
'cat /debug/trace/mmio > dump.txt'.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The custom page fault handler list is replaced with a single function
pointer. All related functions and variables are renamed for
mmiotrace.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: pq@iki.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Provides kernel modules a way to register custom page fault handlers.
On every page fault this will call a list of registered functions. The
functions may handle the fault and force do_page_fault() to return
immediately.
This functionality is similar to the now removed page fault notifiers.
Custom page fault handlers are used by debugging and reverse engineering
tools. Mmiotrace is one such tool and a patch to add it into the tree
will follow.
The custom page fault handlers are called earlier in do_page_fault()
than the page fault notifiers were.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch patches the call to mcount with nops instead
of a jmp over the mcount call.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the notrace annotations to the vsyscall functions - there we are
not in kernel context yet, so the tracer function cannot (and must not)
be called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 2d474871e2fb092eb46a0930aba5442e10eb96cc
Author: Mike Travis <travis@sgi.com>
Date: Mon May 12 21:21:13 2008 +0200
CR4 manipulation is not protected against interrupts and preemption,
but KVM uses smp_function_call to manipulate the X86_CR4_VMXE bit
either from the CPU hotplug code or from the kvm_init call.
We need to protect the CR4 manipulation from both interrupts and
preemption.
Original bug report: http://lkml.org/lkml/2008/5/7/48
Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=10642
This is not a regression from 2.6.25, it's a long standing and hard to
trigger bug.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used. xen-unstable has now officially dropped
non-PAE support. Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The x86_64 pgd_bad(), pud_bad(), pmd_bad() inlines have differed from
their x86_32 counterparts in a couple of ways: they've been unnecessarily
weak (e.g. letting 0 or 1 count as good), and were typed as unsigned long.
Strengthen them and return int.
The PAE pmd_bad was too weak before, allowing any junk in the upper half;
but got strengthened by the patch correcting its ~PAGE_MASK to ~PTE_MASK.
The PAE pud_bad already said ~PTE_MASK; and since it folds into pgd_bad,
and we don't set the protection bits at that level, it'll do as is.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use PTE_MASK to extract mfn from pte.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use ~PTE_MASK to extract the non-pfn parts of the pte (ie, the pte
flags), rather than constructing an ad-hoc mask.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
_PAGE_CHG_MASK is defined as the set of bits not updated by
pte_modify(); specifically, the pfn itself, and the Accessed and Dirty
bits (which are updated by hardware).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Put the definitions of __(VIRTUAL|PHYSICAL)_MASK before their uses.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the warning:
include2/asm/pgtable.h: In function `pte_modify':
include2/asm/pgtable.h:290: warning: left shift count >= width of type
On 32-bit PAE the virtual and physical addresses are both 32-bits,
so it ends up evaluating 1<<32. Do the shift as a 64-bit shift then
cast to the appropriate size. This should all be done at compile time,
and so have no effect on generated code.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define PTE_MASK so that it contains a meaningful value for all x86
pagetable configurations. Previously it was defined as a "long" which
means that it was too short to cover a 32-bit PAE pte entry.
It is now defined as a pteval_t, which is an integer type long enough
to contain a full pte (or pmd, pud, pgd).
This fixes an Xorg crash on 32-bit x86 with PAE due to corruption of the
NX bit in mprotect due to the incorrect type/value of PTE_MASK reported
by Hugh Dickins:
"Yes, thanks Jeremy: I've checked that each stage builds and runs X on
my boxes here, x86_32 and x86_32+PAE and x86_64. (So even 1/8 is
enough to fix the PAT pte_modify issue, though 2/8 then fixes
compiler warnings.)"
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A register destination encoded with a mod=3 encoding left dst.ptr NULL.
Normally we don't trap writes to registers, but in the case of smsw, we do.
Fix by pointing dst.ptr at the destination register.
Signed-off-by: Avi Kivity <avi@qumranet.com>
There is a defect in mprotect, which lets the user change the page cache
type bits by-passing the kernel reserve_memtype and free_memtype
wrappers. Fix the problem by not letting mprotect change the PAT bits.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup gart handling on amd64 a bit: move common code into
enable_gart_translation , and use symbolic register names where
appropriate.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move symbolic constants into gart.h, and use them instead of hardcoded
constant.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
.. allowing the former to be use in non-PAE kernels, too.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The SGI UV system needs several more system vectors than a vanilla
x86_64 system. Rather than burden the other archs with extra system
vectors that they don't use, change FIRST_SYSTEM_VECTOR to a variable,
so that it can be dynamic.
Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The interrupt vector defines are copied 4 times around with minimal
differences. Move them all into asm-x86/irq_vectors.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the common declarations from hw_irq_32/64 into hw_irq.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On machines with very large numbers of cpus, tables that are dimensioned
by NR_IRQS get very large, especially the irq_desc table. They are also
very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
to set NR_IRQS, otherwise use NR_CPUS.
Signed-off-by: Alan Mayer <ajm@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make them similar so that both use THREAD_ORDER and THREAD_FLAGS and have a
THREAD_SIZE definition that is setup in asm/page_xx.h
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The TIF masks are basically the same. x86_32 also has _TIF_SYSCALL_EMU which is
zero for the 64 bit case. The tif masks become the same.
x86_64 has an additional _TIF_DONOTIFY_MASK. Does not hurt for the 32 bit case
since it is only used in x86_64 arch code.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both TIF lists are essentially the same. x86_32 also has TIF_SYSCALL_EMU which
must be undefined for the 64 bit case.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Same for both 32 and 64 bit so simply put it in to the common area.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both definitions are the same. So move to common x86 area.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge the thread_info definition into one structure definition for both arches.
The __u32 is equal to unsigned long for 32 bit.
sysenter_return is used both for the IA32 emulation for 64 and x86_32.
Avoid complicated #ifdef by simply always including it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge both. x86_64 has an additional TS_COMPAT that is harmless
for 32 bit.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move shared includes to a common area in thread_info.h
Adds asm/types.h for x86_64 and linux/compiler.h for x86_32. Not needed but
we can avoid some ifdeffing and it simplifies later joining.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simple merge of both thread_info_32.h and thread_info_64.h into thread_info.h.
Comments for #ifndef __ASM_THREAD_INFO_H and #ifdef __KERNEL__
are the same.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: rdc: leds build/config fix
x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system)
x86: revert commit 709f744 ("x86: bitops asm constraint fixes")
x86: restrict keyboard io ports reservation to make ipmi driver work
x86: fix fpu restore from sig return
x86: remove spew print out about bus to node mapping
x86: revert printk format warning change which is for linux-next
x86: cleanup PAT cpu validation
x86: geode: define geode_has_vsa2() even if CONFIG_MGEODE_LX is not set
x86: GEODE: cache results from geode_has_vsa2() and uninline
x86: revert geode config dependency