Commit graph

135996 commits

Author SHA1 Message Date
Jens Axboe
10cbda97e7 cciss: add BUILD_BUG_ON() for catching bad CommandList_struct alignment
The hardware requires 64-bit alignment of commands, so add a build bug
check for that. The recent commit 8a3173de4a
didn't change the size of the command, but other additions/changes may and
thus break badly at runtime.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-03-24 12:35:16 +01:00
Jens Axboe
a7fcd37cdc block: don't create bio_vec slabs of less than the inline number
If we don't have CONFIG_BLK_DEV_INTEGRITY set, then we don't have
any external dependencies on the bio_vec slabs. So don't create
the ones that we will inline anyway.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-03-24 12:35:16 +01:00
Ingo Molnar
34053979fb block: cleanup bio_alloc_bioset()
this warning (which got fixed by commit b2bf968):

  fs/bio.c: In function ‘bio_alloc_bioset’:
  fs/bio.c:305: warning: ‘p’ may be used uninitialized in this function

Triggered because the code flow in bio_alloc_bioset() is correct
but a bit complex for the compiler to see through.

Streamline it a bit - this also makes the code a tiny bit more compact:

   text	   data	    bss	    dec	    hex	filename
   7540	    256	     40	   7836	   1e9c	bio.o.before
   7539	    256	     40	   7835	   1e9b	bio.o.after

Also remove an older compiler-warnings annotation from this function,
it's not needed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-03-24 12:35:16 +01:00
Steven Whitehouse
df3647b245 GFS2: Fix freeze issue
This removes some old code that was causing issues during
filesystem freeze.

Reported-by: Andrew Price <andy@andrewprice.me.uk>
Tested-by: Andrew Price <andy@andrewprice.me.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:31:30 +00:00
Steven Whitehouse
9c538837d8 Fix a minor bug in the previous patch
The logic requires that we mark the glock dirty in page_mkwrite
otherwise we might not flush correctly in the case that no
allocation was required in the process of dirying the page.
Also we need to set the shared write flag early for the same
reason.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:27 +00:00
Steven Whitehouse
6bac243f07 GFS2: Clean up of glops.c
This cleans up a number of bits of code mostly based in glops.c.
A couple of simple functions have been merged into the callers
to make it more obvious what is going on, the mysterious raising
of i_writecount around the truncate_inode_pages() call has been
removed. The meta_go_* operations have been renamed rgrp_go_*
since that is the only lock type that they are used with.

The unused argument of gfs2_read_sb has been removed. Also
a bug has been fixed where a check for the rindex inode was
in the wrong callback. More comments are added, and the
debugging code is improved too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:27 +00:00
Benjamin Marzinski
02ffad08e8 GFS2: Fix locking bug in failed shared to exclusive conversion
After calling out to the dlm, GFS2 sets the new state of a glock to
gl_target in gdlm_ast().  However, gl_target is not always the lock
state that was requested. If a conversion from shared to exclusive
fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead
of gl->gl_target, so that it can reacquire the lock in exlusive the next
time around.  In this case, setting the lock to gl_target in gdlm_ast()
will make GFS2 think that it has the glock in exclusive mode, when
really, it doesn't have the glock locked at all.  This patch adds a new
field to the gfs2_glock structure, gl_req, to track the mode that was
requested.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:26 +00:00
Hisashi Hifumi
229615def3 GFS2: Pagecache usage optimization on GFS2
I introduced "is_partially_uptodate" aops for GFS2.

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.

I did a performance test using the sysbench.

#sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1
--file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0
--file-rw-ratio=1 run

-2.6.29-rc6
Test execution summary:
    total time:                          202.6389s
    total number of events:              200000
    total time taken by event execution: 2580.0480
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0129s
         max:                            49.5852s
         approx.  95 percentile:         0.0462s

-2.6.29-rc6-patched
Test execution summary:
    total time:                          177.8639s
    total number of events:              200000
    total time taken by event execution: 2419.0199
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0121s
         max:                            52.4306s
         approx.  95 percentile:         0.0444s

arch: ia64
pagesize: 16k
blocksize: 4k

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:25 +00:00
Hannes Eder
02ab172159 GFS2: fix sparse warning: Should it be static?
Impact: Make symbol static.

Fix this sparse warning:
  fs/gfs2/rgrp.c:188:5: warning: symbol 'gfs2_bitfit' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:25 +00:00
Hannes Eder
075ac44875 GFS2: fix sparse warnings: constant is so big it is ...
Fix this sparse warnings:
  fs/gfs2/rgrp.c:156:23: warning: constant 0xffffffffffffffff is so big it is unsigned long long
  fs/gfs2/rgrp.c:157:23: warning: constant 0xaaaaaaaaaaaaaaaa is so big it is unsigned long long
  fs/gfs2/rgrp.c:158:23: warning: constant 0x5555555555555555 is so big it is long long
  fs/gfs2/rgrp.c:194:20: warning: constant 0x5555555555555555 is so big it is long long
  fs/gfs2/rgrp.c:204:44: warning: constant 0x5555555555555555 is so big it is long long

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:24 +00:00
Steven Whitehouse
b9a9694570 GFS2: Support quota/noquota mount arguments
This adds support for "quota" and "noquota" mount options in addition to the
existing "quota=on/off/account" so that we are compatible with the names by
which these options are more generally known.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:23 +00:00
Steven Whitehouse
223b2b889f GFS2: Fix alignment issue and tidy gfs2_bitfit
An alignment issue with the existing bitfit algorithm was reported
on IA64. This patch attempts to fix that, and also to tidy up the
code a bit. There is now more documentation about how this works
and it has survived a number of different tests.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:22 +00:00
Steven Whitehouse
64d576ba23 GFS2: Add a "demote a glock" interface to sysfs
This adds a sysfs file called demote_rq to GFS2's
per filesystem directory. Its possible to use this
file to demote arbitrary glocks in exactly the same
way as if a request had come in from a remote node.

This is intended for testing issues relating to caching
of data under glocks. Despite that, the interface is
generic enough to send requests to any type of glock,
but be careful as its not always safe to send an
arbitrary message to an arbitrary glock. For that reason
and to prevent DoS, this interface is restricted to root
only.

The messages look like this:

<type>:<glocknumber> <mode>

Example:

echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq

Which means "please demote inode glock (type 2) number 13324 so that
I can get an EX (exclusive) lock". The lock modes are those which
would normally be sent by a remote node in its callback so if you
want to unlock a glock, you use EX, to demote to shared, use SH or PR
(depending on whether you like GFS2 or DLM lock modes better!).

If the glock doesn't exist, you'll get -ENOENT returned. If the
arguments don't make sense, you'll get -EINVAL returned.

The plan is that this interface will be used in combination with
the blktrace patch which I recently posted for comments although
it is, of course, still useful in its own right.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:22 +00:00
Steven Whitehouse
02e3cc70ec GFS2: Expose UUID via sysfs/uevent
Since we have a UUID, we ought to expose it to the user via sysfs
and uevents. We already have the fs name in both of these places
(a combination of the lock proto and lock table name) so if we add
the UUID as well, we have a full set.

For older filesystems (i.e. those created before mkfs.gfs2 was writing
UUIDs by default) the sysfs file will appear zero length, and no UUID
env var will be added to the uevents.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:21 +00:00
Steven Whitehouse
f15ab5619d GFS2: Support generation of discard requests
This patch allows GFS2 to generate discard requests for blocks which are
no longer useful to the filesystem (i.e. those which have been freed as
the result of an unlink operation). The requests are generated at the
time which those blocks become available for reuse in the filesystem.

In order to use this new feature, you have to specify the "discard"
mount option. The code coalesces adjacent blocks into a single extent
when generating the discard requests, thus generating the minimum
number.

If an error occurs when the request has been sent to the block device,
then it will print a message and turn off the requests for that
filesystem. If the problem is temporary, then you can use remount to
turn the option back on again. There is also a nodiscard mount option
so that you can use remount to turn discard requests off, if required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:20 +00:00
Steven Whitehouse
d8348de06f GFS2: Fix deadlock on journal flush
This patch fixes a deadlock when the journal is flushed and there
are dirty inodes other than the one which caused the journal flush.
Originally the journal flushing code was trying to obtain the
transaction glock while running the flush code for an inode glock.
We no longer require the transaction glock at this point in time
since we know that any attempt to get the transaction glock from
another node will result in a journal flush. So if we are flushing
the journal, we can be sure that the transaction lock is still
cached from when the transaction was started.

By inlining a version of gfs2_trans_begin() (minus the bit which
gets the transaction glock) we can avoid the deadlock problems
caused if there is a demote request queued up on the transaction
glock.

In addition I've also moved the umount rwsem so that it covers
the glock workqueue, since it all demotions are done by this
workqueue now. That fixes a bug on umount which I came across
while fixing the original problem.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:18 +00:00
Steven Whitehouse
e7c8707ea2 GFS2: Fix error path ref counting for root inode
We were keeping hold of an extra ref to the root inode in one
of the error paths, that resulted in a hang.

Reported-by: Nate Straz <nstraz@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Robert Peterson <rpeterso@redhat.com>
2009-03-24 11:21:17 +00:00
Steven Whitehouse
ac2425e7d3 GFS2: Remove unused field from glock
The time stamp field is unused in the glock now that we are
using a shrinker, so that we can remove it and save sizeof(unsigned long)
bytes in each glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:17 +00:00
Steven Whitehouse
f057f6cdf6 GFS2: Merge lock_dlm module into GFS2
This is the big patch that I've been working on for some time
now. There are many reasons for wanting to make this change
such as:
 o Reducing overhead by eliminating duplicated fields between structures
 o Simplifcation of the code (reduces the code size by a fair bit)
 o The locking interface is now the DLM interface itself as proposed
   some time ago.
 o Fewer lookups of glocks when processing replies from the DLM
 o Fewer memory allocations/deallocations for each glock
 o Scope to do further optimisations in the future (but this patch is
   more than big enough for now!)

Please note that (a) this patch relates to the lock_dlm module and
not the DLM itself, that is still a separate module; and (b) that
we retain the ability to build GFS2 as a standalone single node
filesystem with out requiring the DLM.

This patch needs a lot of testing, hence my keeping it I restarted
my -git tree after the last merge window. That way, this has the maximum
exposure before its merged. This is (modulo a few minor bug fixes) the
same patch that I've been posting on and off the the last three months
and its passed a number of different tests so far.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:14 +00:00
Steven Whitehouse
22077f57de GFS2: Remove "double" locking in quota
We only really need a single spin lock for the quota data, so
lets just use the lru lock for now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2009-03-24 11:21:13 +00:00
Abhijith Das
0a7ab79c5b GFS2: change gfs2_quota_scan into a shrinker
Deallocation of gfs2_quota_data objects now happens on-demand through a
shrinker instead of routinely deallocating through the quotad daemon.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:12 +00:00
Abhijith Das
2db2aac255 GFS2: Bring back lvb-related stuff to lock_nolock to support quotas
The quota code uses lvbs and this is currently not implemented in
lock_nolock, thereby causing panics when quota is enabled with
lock_nolock. This patch adds the relevant bits.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:11 +00:00
Steven Whitehouse
6f04c1c7fe GFS2: Fix remount argument parsing
The following patch fixes an issue relating to remount and argument
parsing. After this fix is applied, remount becomes atomic in that
it either succeeds changing the mount to the new state, or it fails
and leaves it in the old state. Previously it was possible for the
parsing of options to fail part way though and for the fs to be left
in a state where some of the new arguments had been applied, but some
had not.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:10 +00:00
Avi Kivity
16175a796d KVM: VMX: Don't allow uninhibited access to EFER on i386
vmx_set_msr() does not allow i386 guests to touch EFER, but they can still
do so through the default: label in the switch.  If they set EFER_LME, they
can oops the host.

Fix by having EFER access through the normal channel (which will check for
EFER_LME) even on i386.

Reported-and-tested-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:15 +02:00
Sheng Yang
bc7a8660df KVM: Correct deassign device ioctl to IOW
It's IOR by mistake, so fix it before release.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:15 +02:00
Liu Yu
046a48b35b KVM: ppc: e500: Fix the bug that KVM is unstable in SMP
TLB entry should enable memory coherence in SMP.

And like commit 631fba9dd3aca519355322cef035730609e91593,
remove guard attribute to enable the prefetch of guest memory.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:15 +02:00
Liu Yu
bc35cbc85c KVM: ppc: e500: Fix the bug that mas0 update to wrong value when read TLB entry
Should clear and then update the next victim area here.

Guest kernel only read TLB1 when startup kernel,
this bug result in an extra 4K TLB1 mapping in guest from 0x0 to 0x0.

As the problem has no impact to bootup a guest,
we didn't notice it before.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:14 +02:00
Andrea Arcangeli
4539b35881 KVM: Fix missing smp tlb flush in invlpg
When kvm emulates an invlpg instruction, it can drop a shadow pte, but
leaves the guest tlbs intact.  This can cause memory corruption when
swapping out.

Without this the other cpu can still write to a freed host physical page.
tlb smp flush must happen if rmap_remove is called always before mmu_lock
is released because the VM will take the mmu_lock before it can finally add
the page to the freelist after swapout. mmu notifier makes it safe to flush
the tlb after freeing the page (otherwise it would never be safe) so we can do
a single flush for multiple sptes invalidated.

Cc: stable@kernel.org
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:14 +02:00
Sheng Yang
36463146ff KVM: Get support IRQ routing entry counts
In capability probing ioctl.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:14 +02:00
Hannes Eder
cded19f396 KVM: fix sparse warnings: Should it be static?
Impact: Make symbols static.

Fix this sparse warnings:
  arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static?
  arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static?
  arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static?
  arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static?
  arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static?
  virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:14 +02:00
Hannes Eder
d7364a29b3 KVM: fix sparse warnings: context imbalance
Impact: Attribute function with __acquires(...) resp. __releases(...).

Fix this sparse warnings:
  arch/x86/kvm/i8259.c:34:13: warning: context imbalance in 'pic_lock' - wrong count at exit
  arch/x86/kvm/i8259.c:39:13: warning: context imbalance in 'pic_unlock' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:13 +02:00
Amit Shah
41d6af1192 KVM: is_long_mode() should check for EFER.LMA
is_long_mode currently checks the LongModeEnable bit in
EFER instead of the LongModeActive bit. This is wrong, but
we survived this till now since it wasn't triggered. This
breaks guests that go from long mode to compatibility mode.

This is noticed on a solaris guest and fixes bug #1842160

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2009-03-24 11:03:13 +02:00
Amit Shah
401d10dee0 KVM: VMX: Update necessary state when guest enters long mode
setup_msrs() should be called when entering long mode to save the
shadow state for the 64-bit guest state.

Using vmx_set_efer() in enter_lmode() removes some duplicated code
and also ensures we call setup_msrs(). We can safely pass the value
of shadow_efer to vmx_set_efer() as no other bits in the efer change
while enabling long mode (guest first sets EFER.LME, then sets CR0.PG
which causes a vmexit where we activate long mode).

With this fix, is_long_mode() can check for EFER.LMA set instead of
EFER.LME and 5e23049e86dd298b72e206b420513dbc3a240cd9 can be reverted.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:13 +02:00
Xiantao Zhang
6b08035f3e KVM: ia64: Fix the build errors due to lack of macros related to MSI.
Include the newly introduced msidef.h to solve the build issues.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:13 +02:00
Xiantao Zhang
2fa8937f3a ia64: Move the macro definitions related to MSI to one header file.
For kvm's MSI support, it needs these macros defined in ia64_msi.c, and
to avoid duplicate them, move them to one header file and share with
kvm.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:12 +02:00
Weidong Han
4a906e49f1 KVM: fix kvm_vm_ioctl_deassign_device
only need to set assigned_dev_id for deassignment, use
match->flags to judge and deassign it.

Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:12 +02:00
Weidong Han
2df8a40bcc KVM: define KVM_CAP_DEVICE_DEASSIGNMENT
define KVM_CAP_DEVICE_DEASSIGNMENT and KVM_DEASSIGN_PCI_DEVICE
for device deassignment.

the ioctl has been already implemented in the
commit: 0a92035674

Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:12 +02:00
Liu Yu
b0a1835d53 KVM: ppc: Add emulation of E500 register mmucsr0
Latest kernel flushes TLB via mmucsr0.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:12 +02:00
Gleb Natapov
71450f7885 KVM: Report IRQ injection status for MSI delivered interrupts
Return number of CPUs interrupt was successfully injected into or -1 if
none.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:11 +02:00
Joerg Roedel
c5bc224240 KVM: MMU: Fix another largepage memory leak
In the paging_fetch function rmap_remove is called after setting a large
pte to non-present. This causes rmap_remove to not drop the reference to
the large page. The result is a memory leak of that page.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:11 +02:00
Andre Przywara
1fbdc7a585 KVM: SVM: set accessed bit for VMCB segment selectors
In the segment descriptor _cache_ the accessed bit is always set
(although it can be cleared in the descriptor itself). Since Intel
checks for this condition on a VMENTRY, set this bit in the AMD path
to enable cross vendor migration.

Cc: stable@kernel.org
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Acked-By: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:11 +02:00
Gleb Natapov
4925663a07 KVM: Report IRQ injection status to userspace.
IRQ injection status is either -1 (if there was no CPU found
that should except the interrupt because IRQ was masked or
ioapic was misconfigured or ...) or >= 0 in that case the
number indicates to how many CPUs interrupt was injected.
If the value is 0 it means that the interrupt was coalesced
and probably should be reinjected.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:11 +02:00
Joerg Roedel
452425dbaa KVM: MMU: remove assertion in kvm_mmu_alloc_page
The assertion no longer makes sense since we don't clear page tables on
allocation; instead we clear them during prefetch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:10 +02:00
Joerg Roedel
6bed6b9e84 KVM: MMU: remove redundant check in mmu_set_spte
The following code flow is unnecessary:

	if (largepage)
		was_rmapped = is_large_pte(*shadow_pte);
	 else
	 	was_rmapped = 1;

The is_large_pte() function will always evaluate to one here because the
(largepage && !is_large_pte) case is already handled in the first
if-clause. So we can remove this check and set was_rmapped to one always
here.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:10 +02:00
Joerg Roedel
fc5659c8c6 KVM: MMU: handle compound pages in kvm_is_mmio_pfn
The function kvm_is_mmio_pfn is called before put_page is called on a
page by KVM. This is a problem when when this function is called on some
struct page which is part of a compund page. It does not test the
reserved flag of the compound page but of the struct page within the
compount page. This is a problem when KVM works with hugepages allocated
at boot time. These pages have the reserved bit set in all tail pages.
Only the flag in the compount head is cleared. KVM would not put such a
page which results in a memory leak.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:09 +02:00
Gerd Hoffmann
c807660407 KVM: Fix kvmclock on !constant_tsc boxes
kvmclock currently falls apart on machines without constant tsc.
This patch fixes it.  Changes:

  * keep tsc frequency in a per-cpu variable.
  * handle kvmclock update using a new request flag, thus checking
    whenever we need an update each time we enter guest context.
  * use a cpufreq notifier to track frequency changes and force
    kvmclock updates.
  * send ipis to kick cpu out of guest context if needed to make
    sure the guest doesn't see stale values.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:09 +02:00
Sheng Yang
49cd7d2238 KVM: VMX: Use kvm_mmu_page_fault() handle EPT violation mmio
Removed duplicated code.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:09 +02:00
Sheng Yang
79950e1073 KVM: Use irq routing API for MSI
Merge MSI userspace interface with IRQ routing table. Notice the API have been
changed, and using IRQ routing table would be the only interface kvm-userspace
supported.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:09 +02:00
Jan Kiszka
34c33d163f KVM: Drop unused evaluations from string pio handlers
Looks like neither the direction nor the rep prefix are used anymore.
Drop related evaluations from SVM's and VMX's I/O exit handlers.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:08 +02:00
Alexander Graf
1b2fd70c4e KVM: Add FFXSR support
AMD K10 CPUs implement the FFXSR feature that gets enabled using
EFER. Let's check if the virtual CPU description includes that
CPUID feature bit and allow enabling it then.

This is required for Windows Server 2008 in Hyper-V mode.

v2 adds CPUID capability exposure

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:08 +02:00