The tsc_offset adjustment in svm_vcpu_load is executed
unconditionally even if Linux considers the host tsc as
stable. This causes a Linux guest detecting an unstable tsc
in any case.
This patch removes the tsc_offset adjustment if the host tsc
is stable. The guest will now get the benefit of a stable
tsc too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Before enabling, execution of "rdtscp" in guest would result in #UD.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sometime, we need to adjust some state in order to reflect guest CPUID
setting, e.g. if we don't expose rdtscp to guest, we won't want to enable
it on hardware. cpuid_update() is introduced for this purpose.
Also export kvm_find_cpuid_entry() for later use.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
KVM need vsyscall_init() to initialize MSR_TSC_AUX before it read the value.
Per Avi's suggestion, this patch raised vsyscall priority on hotplug notifier
chain, to 30.
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
shared_msr_global saved host value of relevant MSRs, but it have an
assumption that all MSRs it tracked shared the value across the different
CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.
Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
alike MSRs.
Notice now the shared_msr_global still have one assumption: it can only deal
with the MSRs that won't change in host after KVM module loaded.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
ept_update_paging_mode_cr4() accesses vcpu->arch.cr4 directly, which usually
needs to be accessed via kvm_read_cr4(). In this case, we can't, since cr4
is in the process of being updated. Instead of adding inane comments, fold
the function into its caller (vmx_set_cr4), so it can use the not-yet-committed
cr4 directly.
Signed-off-by: Avi Kivity <avi@redhat.com>
We make no use of cr4.pge if ept is enabled, but the guest does (to flush
global mappings, as with vmap()), so give the guest ownership of this bit.
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of specifying the bits which we want to trap on, specify the bits
which we allow the guest to change transparently. This is safer wrt future
changes to cr4.
Signed-off-by: Avi Kivity <avi@redhat.com>
Some bits of cr4 can be owned by the guest on vmx, so when we read them,
we copy them to the vcpu structure. In preparation for making the set of
guest-owned bits dynamic, use helpers to access these bits so we don't need
to know where the bit resides.
No changes to svm since all bits are host-owned there.
Signed-off-by: Avi Kivity <avi@redhat.com>
We don't support these instructions, but guest can execute them even if the
feature('monitor') haven't been exposed in CPUID. So we would trap and inject
a #UD if guest try this way.
Cc: stable@kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In the past we've had errors of single-bit in the other two cases; the
printk() may confirm it for the third case (many->many).
Signed-off-by: Avi Kivity <avi@redhat.com>
Windows 2003 uses task switch to triple fault and reboot (the other
exception being reserved pdptrs bits).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Some Toshibas only send ACPI events on key down, not key release. Ignore
any release events and send key down and key up events on every ACPI key
down event.
Signed-off-by: Frans Pop <elendil@planet.nl>
Make sure that work is cancelled after removing the i8042 filter, and
unregister the platform device rather than deleting it.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The following build bug (x86, allyesconfig):
arch/x86/oprofile/built-in.o:(.data+0x250): multiple definition of `buffer_mutex'
Was triggered in -tip testing, caused by this upstream commit:
116ee77: dell-laptop: Use buffer with 32-bit physical address
There's multiple buffer_mutex's in the kernel. Make this new one
static.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes glock numbers from printing in decimal to hex.
Since DLM prints corresponding resource IDs in hex, it makes debugging
easier.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When we queue data buffers for ordered write, the buffers are added
to the head of the ordered write list. When the log needs to push
these buffers to disk, it also walks the list from the head. The
result is that the the ordered buffers are submitted to disk in
reverse order.
For large writes, this means that whenever the log flushes large
streams of reverse sequential order buffers are pushed down into the
block layers. The elevators don't handle this particularly well, so
IO rates tend to be significantly lower than if the IO was issued in
ascending block order.
Queue new ordered buffers to the tail of the ordered buffer list to
ensure that IO is dispatched in the order it was submitted. This
should significantly improve large sequential write speeds. On a
disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for
noop and from 38MB/s to 50MB/s for cfq.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is the kernel portion of the patch-set for upstream gfs2,
to remove the quota-linked-list stuff and replace it with
fiemap-based traversal of the quota file.
The corresponding userland fixes have been pushed to
STABLE3 and master branches of cluster.git and gfs2-utils.git
respectively (Refer Red Hat bug #536902).
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As a consequence of the previous patch, we can now remove the
loop which used to be required due to the circular dependency
between the inodes and glocks. Instead we can just invalidate
the inodes, and then clear up any glocks which are left.
Also we no longer need the rwsem since there is no longer any
danger of the inode invalidation calling back into the glock
code (and from there back into the inode code).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since the start of GFS2, an "extra" inode has been used to store
the metadata belonging to each inode. The only reason for using
this inode was to have an extra address space, the other fields
were unused. This means that the memory usage was rather inefficient.
The reason for keeping each inode's metadata in a separate address
space is that when glocks are requested on remote nodes, we need to
be able to efficiently locate the data and metadata which relating
to that glock (inode) in order to sync or sync and invalidate it
(depending on the remotely requested lock mode).
This patch adds a new type of glock, which has in addition to
its normal fields, has an address space. This applies to all
inode and rgrp glocks (but to no other glock types which remain
as before). As a result, we no longer need to have the second
inode.
This results in three major improvements:
1. A saving of approx 25% of memory used in caching inodes
2. A removal of the circular dependency between inodes and glocks
3. No confusion between "normal" and "metadata" inodes in super.c
Although the first of these is the more immediately apparent, the
second is just as important as it now enables a number of clean
ups at umount time. Those will be the subject of future patches.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
omapfb has several custom ioctls so user space needs
the header in order to utilize them.
Signed-off-by: Ville Syrjälä <ville.syrjala@nokia.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
radeon was always including the atpx code unnecessarily, also core
switcheroo was including acpi headers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Type of iterator was promoted to unsigned long in 64bit systems.
*header is small structure so it is alwas safe to cast return value
of sizeof operator to int.
Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
As the comment says the initial value of last_waited is never used, so
there is no need to initialise it with the current jiffies. Jiffies is
hot enough without accessing it for no reason.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Reorder cfq_rb_root to remove 8 bytes of padding on 64 bit builds.
Consequently removing 56 bytes from cfq_group and 64 bytes from
cfq_data.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently a queue can only dispatch up to 4 requests if there are other queues.
This isn't optimal, device can handle more requests, for example, AHCI can
handle 31 requests. I can understand the limit is for fairness, but we could
do a tweak: if the queue still has a lot of slice left, sounds we could
ignore the limit. Test shows this boost my workload (two thread randread of
a SSD) from 78m/s to 100m/s.
Thanks for suggestions from Corrado and Vivek for the patch.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Many new laptops now come with 2 gpus, one to be used for low power
modes and one for gaming/on-ac applications. These GPUs are typically
wired to the laptop panel and VGA ports via a multiplexer unit which
is controlled via ACPI methods.
4 combinations of systems typically exist - with 2 ACPI methods.
Intel/ATI - Lenovo W500/T500 - use ATPX ACPI method
ATI/ATI - some ASUS - use ATPX ACPI Method
Intel/Nvidia - - use _DSM ACPI method
Nvidia/Nvidia - - use _DSM ACPI method.
TODO:
This patch adds support for the ATPX method and initial bits
for the _DSM methods that need to written by someone with
access to the hardware.
Add a proper non-debugfs interface - need to get some proper
testing first.
v2: add power up/down support for both devices
on W500 puts i915/radeon into D3 and cuts power to radeon.
v3: redo probing methods, no DMI list, drm devices call to
register with switcheroo, it tries to find an ATPX method on
any device and once there is two devices + ATPX it inits the
switcher.
v4: ATPX msg handling using buffers - should work on more machines
v5: rearchitect after more mjg59 discussion - move ATPX handling to
radeon driver.
v6: add file headers + initial nouveau bits (to be filled out).
v7: merge delayed switcher code.
v8: avoid suspend/resume of gpu that is off
v9: rearchitect - mjg59 is always right. - move all ATPX code to
radeon, should allow simpler DSM also proper ATRM handling
v10: add ATRM support for radeon BIOS, add mutex to lock vgasr_priv
v11: fix bug in resuming Intel for 2nd time.
v12: start fixing up nvidia code blindly.
v13: blindly guess at finishing nvidia code
v14: remove radeon audio hacks - fix up intel resume more like upstream
v15: clean up printks + remove unnecessary igd/dis pointers
mount debugfs
/sys/kernel/debug/vgaswitcheroo/switch - should exist if ATPX detected
+ 2 cards.
DIS - immediate change to discrete
IGD - immediate change to IGD
DDIS - delayed change to discrete
DIGD - delayed change to IGD
ON - turn on not in use
OFF - turn off not in use
Tested on W500 (Intel/ATI) and T500 (Intel/ATI)
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit "disabled" audio on RV710 and RV740 only, leaving RV770 and RV730.
The order is: CHIP_RV770 < CHIP_RV730 < CHIP_RV710 < CHIP_RV740.
It is not needed anway, as we do not even try to enable audio on RV770 and
newer. We call initializing function in r600.c only, not in rv770.c.
If there is something causing green tinges, it's HDMI mode setting for encoder
and I will try to debug that.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'drm-radeon-testing' of /ssd/git/drm-radeon-next:
drm/radeon: r100/r200 ums: block ability for userspace app to trash 0 page and beyond
drm/ttm: fix function prototype to match implementation
drm/radeon: use ALIGN instead of open coding it
drm/radeon/kms: initialize set_surface_reg reg for rs600 asic
radeon's have a special ability to passthrough writes in their internal
memory space directly to PCI, this ability means that if some of the internal
surfaces like the depth buffer point at 0x0, any writes to these will
go directly to RAM at 0x0 via PCI busmastering.
Now mesa used to always emit clears after emitting state, since the
radeon mesa driver was refactored a year or more ago, it was found it
could generate a clear request without ever sending any setup state to the
card. So the clear would attempt to clear the depth buffer at 0x0, which
would overwrite main memory at this point. fs corruption ensues.
Also once one app did this correctly, it would never get set back to 0
making this messy to reproduce.
The kernel should block this from happening as mesa runs without privs,
though it does require the user be connected to the current running X session.
This patch implements a check to make sure the depth offset has been set
before a depth clear occurs and if it finds one it prints a warning and
ignores the depth clear request. There is also a mesa fix to avoid sending
the badness going into mesa.
This only affects r100/r200 GPUs in user modesetting mode.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix function prototype to match its actual usage and implementation.
drivers/gpu/drm/ttm/ttm_bo_util.c:341:10: error: symbol 'ttm_io_prot' redeclared with different type (originally declared at include/drm/ttm/ttm_bo_driver.h:911) - incompatible argument 1 (different signedness)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Dave Airlie <airlied@redhat.com>