Commit graph

5025 commits

Author SHA1 Message Date
Linus Torvalds
8533ce7271 These are the x86, MIPS and s390 changes; PPC and ARM will come in a
few days.
 
 MIPS and s390 have little going on this release; just bugfixes, some
 small, some larger.
 
 The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations
 for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86
 emulator bugfixes (Nadav Amit).
 
 Stephen Rothwell reported a trivial conflict with the tracing branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJT300XAAoJEBvWZb6bTYby3V8QAJz+XyajnhJ8wH55Vxczz22L
 i2gtUGmBLhEXsBcaVKO4BBfek88lLzg0SGLjfW5wCMQmKtxVlrwTCXNkBoPGjapd
 NwHtWkMKym44PDhRovn7zkSumkxC43uFIBR/ebrhP6Bvhh9s+MnkQUxfw9ILB+YV
 EeKyEG8sSgxFCciuHbp3mIXpDcO6r/ldy6I7009OdyhLoMY+Kvmk7kRe9wtAivdg
 CGJi60QvGOn2RGRPOCEtF6UWr8Ae8fe1t84o0hkXPv/j3jtabzAatXKJa4dYNbIs
 7Mp4NQpxaGV6rq3WCYVeZRxGs+UReGDAS3Il4Z8C9eTOTooSfxdVr8acpM8PY6I8
 UmLT6ECLGycc4ELXrETtR+QLmiXACyJqyVxz4aiLV3kWSWfamKD3hBeQK9NizNcE
 VoPDl+PyISvR1tW4KstBuzfUWAEXi+gO78cqqFr/VW6cl7HKpA1DFQaPfGkYKDae
 2CPwcLwI5/M6RtSgkyXTkEqNZLc2BjldqSeM1lmWjhZVW56X2iqePUL46Vab3Yvt
 U+sELtwEE560NLN3hbaHUsLR1tcUix5w8vTzcXPxgoHQBszHCcAZTWd1XHulr64F
 rp/cangqtkPKcu5j1mNhQs38oLjHI1MUsbQrqFoD4tmHjQ75iXHRFzYGoIVKXyHG
 AnGbQzJzBcdAANhm3LW0
 =UXxV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "These are the x86, MIPS and s390 changes; PPC and ARM will come in a
  few days.

  MIPS and s390 have little going on this release; just bugfixes, some
  small, some larger.

  The highlights for x86 are nested VMX improvements (Jan Kiszka),
  optimizations for old processor (up to Nehalem, by me and Bandan Das),
  and a lot of x86 emulator bugfixes (Nadav Amit).

  Stephen Rothwell reported a trivial conflict with the tracing branch"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (104 commits)
  x86/kvm: Resolve shadow warnings in macro expansion
  KVM: s390: rework broken SIGP STOP interrupt handling
  KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
  KVM: vmx: remove duplicate vmx_mpx_supported() prototype
  KVM: s390: Fix memory leak on busy SIGP stop
  x86/kvm: Resolve shadow warning from min macro
  kvm: Resolve missing-field-initializers warnings
  Replace NR_VMX_MSR with its definition
  KVM: x86: Assertions to check no overrun in MSR lists
  KVM: x86: set rflags.rf during fault injection
  KVM: x86: Setting rflags.rf during rep-string emulation
  KVM: x86: DR6/7.RTM cannot be written
  KVM: nVMX: clean up nested_release_vmcs12 and code around it
  KVM: nVMX: fix lifetime issues for vmcs02
  KVM: x86: Defining missing x86 vectors
  KVM: x86: emulator injects #DB when RFLAGS.RF is set
  KVM: x86: Cleanup of rflags.rf cleaning
  KVM: x86: Clear rflags.rf on emulated instructions
  KVM: x86: popf emulation should not change RF
  KVM: x86: Clearing rflags.rf upon skipped emulated instruction
  ...
2014-08-04 12:16:46 -07:00
Linus Torvalds
b8c0aa46b3 This pull request has a lot of work done. The main thing is the changes
to the ftrace function callback infrastructure. It's introducing a
 way to allow different functions to call directly different trampolines
 instead of all calling the same "mcount" one.
 
 The only user of this for now is the function graph tracer, which always
 had a different trampoline, but the function tracer trampoline was called
 and did basically nothing, and then the function graph tracer trampoline
 was called. The difference now, is that the function graph tracer
 trampoline can be called directly if a function is only being traced by
 the function graph trampoline. If function tracing is also happening on
 the same function, the old way is still done.
 
 The accounting for this takes up more memory when function graph tracing
 is activated, as it needs to keep track of which functions it uses.
 I have a new way that wont take as much memory, but it's not ready yet
 for this merge window, and will have to wait for the next one.
 
 Another big change was the removal of the ftrace_start/stop() calls that
 were used by the suspend/resume code that stopped function tracing when
 entering into suspend and resume paths. The stop of ftrace was done
 because there was some function that would crash the system if one called
 smp_processor_id()! The stop/start was a big hammer to solve the issue
 at the time, which was when ftrace was first introduced into Linux.
 Now ftrace has better infrastructure to debug such issues, and I found
 the problem function and labeled it with "notrace" and function tracing
 can now safely be activated all the way down into the guts of suspend
 and resume.
 
 Other changes include clean ups of uprobe code.
 Clean up of the trace_seq() code.
 And other various small fixes and clean ups to ftrace and tracing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn
 xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5
 FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y
 bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb
 Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47
 rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q=
 =fwQY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This pull request has a lot of work done.  The main thing is the
  changes to the ftrace function callback infrastructure.  It's
  introducing a way to allow different functions to call directly
  different trampolines instead of all calling the same "mcount" one.

  The only user of this for now is the function graph tracer, which
  always had a different trampoline, but the function tracer trampoline
  was called and did basically nothing, and then the function graph
  tracer trampoline was called.  The difference now, is that the
  function graph tracer trampoline can be called directly if a function
  is only being traced by the function graph trampoline.  If function
  tracing is also happening on the same function, the old way is still
  done.

  The accounting for this takes up more memory when function graph
  tracing is activated, as it needs to keep track of which functions it
  uses.  I have a new way that wont take as much memory, but it's not
  ready yet for this merge window, and will have to wait for the next
  one.

  Another big change was the removal of the ftrace_start/stop() calls
  that were used by the suspend/resume code that stopped function
  tracing when entering into suspend and resume paths.  The stop of
  ftrace was done because there was some function that would crash the
  system if one called smp_processor_id()! The stop/start was a big
  hammer to solve the issue at the time, which was when ftrace was first
  introduced into Linux.  Now ftrace has better infrastructure to debug
  such issues, and I found the problem function and labeled it with
  "notrace" and function tracing can now safely be activated all the way
  down into the guts of suspend and resume

  Other changes include clean ups of uprobe code, clean up of the
  trace_seq() code, and other various small fixes and clean ups to
  ftrace and tracing"

* tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
  ftrace: Add warning if tramp hash does not match nr_trampolines
  ftrace: Fix trampoline hash update check on rec->flags
  ring-buffer: Use rb_page_size() instead of open coded head_page size
  ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
  tracing: Convert local function_graph functions to static
  ftrace: Do not copy old hash when resetting
  tracing: let user specify tracing_thresh after selecting function_graph
  ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
  tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
  s390/ftrace: remove check of obsolete variable function_trace_stop
  arm64, ftrace: Remove check of obsolete variable function_trace_stop
  Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
  metag: ftrace: Remove check of obsolete variable function_trace_stop
  microblaze: ftrace: Remove check of obsolete variable function_trace_stop
  MIPS: ftrace: Remove check of obsolete variable function_trace_stop
  parisc: ftrace: Remove check of obsolete variable function_trace_stop
  sh: ftrace: Remove check of obsolete variable function_trace_stop
  sparc64,ftrace: Remove check of obsolete variable function_trace_stop
  tile: ftrace: Remove check of obsolete variable function_trace_stop
  ftrace: x86: Remove check of obsolete variable function_trace_stop
  ...
2014-08-04 11:50:00 -07:00
Linus Torvalds
f2a84170ed Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:

 - Major reorganization of percpu header files which I think makes
   things a lot more readable and logical than before.

 - percpu-refcount is updated so that it requires explicit destruction
   and can be reinitialized if necessary.  This was pulled into the
   block tree to replace the custom percpu refcnting implemented in
   blk-mq.

 - In the process, percpu and percpu-refcount got cleaned up a bit

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
  percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
  percpu-refcount: require percpu_ref to be exited explicitly
  percpu-refcount: use unsigned long for pcpu_count pointer
  percpu-refcount: add helpers for ->percpu_count accesses
  percpu-refcount: one bit is enough for REF_STATUS
  percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
  workqueue: stronger test in process_one_work()
  workqueue: clear POOL_DISASSOCIATED in rebind_workers()
  percpu: Use ALIGN macro instead of hand coding alignment calculation
  percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
  percpu: preffity percpu header files
  percpu: use raw_cpu_*() to define __this_cpu_*()
  percpu: reorder macros in percpu header files
  percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
  percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
  percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
  percpu: reorganize include/linux/percpu-defs.h
  percpu: move accessors from include/linux/percpu.h to percpu-defs.h
  percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
  percpu: introduce arch_raw_cpu_ptr()
  ...
2014-08-04 10:09:27 -07:00
Linus Torvalds
f74ad8df4e PCI changes for the v3.17 merge window:
Resource management
     - Support BAR sizes up to 128GB (Yinghai Lu)
     - Keep original resource if we fail to expand it (Guo Chao)
     - Return conventional error values from pci_revert_fw_address() (Bjorn Helgaas)
     - Tidy resource assignment messages (Bjorn Helgaas)
     - Don't exclude low BIOS area for non-PCI cards (Christoph Schulz)
 
   PCI device hotplug
     - Prevent NULL dereference during pciehp probe (Andreas Noever)
     - Make pciehp pcie_wait_cmd() self-contained (Bjorn Helgaas)
     - Wait for pciehp hotplug command completion lazily (Bjorn Helgaas)
     - Compute pciehp timeout from hotplug command start time (Bjorn Helgaas)
     - Remove pciehp assumptions about which commands cause completion events (Bjorn Helgaas)
     - Clear pciehp Data Link Layer State Changed during init (Myron Stowe)
     - Remove pciehp struct controller.no_cmd_complete (Rajat Jain)
     - Remove cpqphp unnecessary null test (Fabian Frederick)
     - Remove "invalid IRQ" warning for hot-added PCIe ports (Jiang Liu)
 
   IOMMU
     - Add DMA alias quirk for Intel 82801 bridge (Alex Williamson)
 
   MSI
     - Add internal msix_clear_and_set_ctrl() (Yijing Wang)
     - Remove unused msi_enabled_mask() (Yijing Wang)
     - Cache Multiple Message Capable in struct msi_desc (Yijing Wang)
     - Add msi_setup_entry() to clean up initialization (Yijing Wang)
     - Remove unused msi_remove_pci_irq_vectors() (Yijing Wang)
     - Retrieve first MSI IRQ from msi_desc rather than pci_dev (Yijing Wang)
     - Remove unused list access in __pci_restore_msix_state() (Yijing Wang)
     - Use irq_get_msi_desc() to simplify code (Yijing Wang)
 
   Generic host bridge driver
     - Fix GPL v2 license string typo (Bjorn Helgaas)
 
   Marvell MVEBU
     - Fix GPL v2 license string typo (Thierry Reding)
 
   NVIDIA Tegra
     - Use correct initial HW settings (Phil Edworthy)
     - Remove rcar_pcie_setup_window() resource argument (Phil Edworthy)
     - Fix GPL v2 license string typo (Thierry Reding)
 
   Renesas R-Car
     - Remove redundant config accessor register checks (Sergei Shtylyov)
     - Fix GPL v2 license string typo (Bjorn Helgaas)
 
   Virtualization
     - Factor secondary bus reset logic (Gavin Shan)
     - Remove duplicate powerpc reset logic (Gavin Shan)
 
   Miscellaneous
     - Rework default VGA detection for EFI (Bruno Prémont)
     - Fix sysfs "acpi_index" and "label" errors for NIC renaming (Simone Gotti)
     - Configure ASPM at pci_enable_device()-time (Vidya Sagar)
     - Add include/linux/pci_ids.h include guard (Rasmus Villemoes)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTzppBAAoJEFmIoMA60/r8mtQP/jgVWCSU+0ulHjoxVSRLu4Lc
 UGKQFjS03oWWflHdvW6wZFqN82Ynva9fYCLMtiKdPg7cgTosSRT3I4DjAIm80ZI/
 kZvHxSmi6DBYmchZBsWzj60zxNiYZeEgd7CevzcJRHuwbKNMr2y12s6hjJbyl5lF
 ygaXWpDveKjsEDjyk9vKjUGwul/NJKynar253Yh178XaoypdGuiEIw3D1lQFMZZp
 ADcRijIi+CD2BENtDr6fbldbj+yQ93yyUSloEnaKtWZD+Ao5IsHngN0IyRu+l1Wl
 LFob0AsopeYVFKdw22Gn1KAq9Jj01acsSBRXjgrauU+tLY512Vkbp1lFYl85B/38
 /Z0VNHncmIh29rq9Tl2xQwEeI3Ja27FfnMjC70dLM5YjWf8vsYnDEQZHyxAAe15D
 p3H3YuuDjmvHkoSrHY/68DLfDl9ubw3/BFUlCMqijL7444ZWLXathrnCV8ZJimmr
 PlF/m7GtXYF4wIw19m9KQqNBUPJJEsVHExKzICOY4v5/nMlvx4ZkBDR3tPNEH1sk
 3AYKjLDw21Nle7yKcAlxDI/TYWZqxuph23UpevzlQd16tutq2i2FqpauiqI3DFm4
 VfYVbOVQwfeUJt11VOCgxvE7RsTxCk5QefB+YKVAdVK6vMZHeZxsetYvrCDptnea
 cId/NfiEFnmr+u3mAyPM
 =U5Ip
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "I'll be on vacation until Aug 11, and I suspect the merge window will
  open before then, so I'm sending this to you early.  There are more
  things I'd like to get into v3.17, so I hope to send another pull
  request soon after I return.

  The most notable pieces here are:

   - Support BARs up to 128GB (up from 8GB)
   - Fix SR-IOV resource assignment when we fail to expand a resource
   - Rework pciehp to handle a common hardware erratum
   - Cleanup MSI
   - Fix NIC renaming issue
   - Fix VGA default device issue on EFI systems
   - Fix ASPM configuration (previously we didn't enable it as expected)

  Alex Williamson has graciously agreed to take care of any major issues
  with this if you take it before I return.

  Details:

  Resource management
    - Support BAR sizes up to 128GB (Yinghai Lu)
    - Keep original resource if we fail to expand it (Guo Chao)
    - Return conventional error values from pci_revert_fw_address() (Bjorn Helgaas)
    - Tidy resource assignment messages (Bjorn Helgaas)
    - Don't exclude low BIOS area for non-PCI cards (Christoph Schulz)

  PCI device hotplug
    - Prevent NULL dereference during pciehp probe (Andreas Noever)
    - Make pciehp pcie_wait_cmd() self-contained (Bjorn Helgaas)
    - Wait for pciehp hotplug command completion lazily (Bjorn Helgaas)
    - Compute pciehp timeout from hotplug command start time (Bjorn Helgaas)
    - Remove pciehp assumptions about which commands cause completion events (Bjorn Helgaas)
    - Clear pciehp Data Link Layer State Changed during init (Myron Stowe)
    - Remove pciehp struct controller.no_cmd_complete (Rajat Jain)
    - Remove cpqphp unnecessary null test (Fabian Frederick)
    - Remove "invalid IRQ" warning for hot-added PCIe ports (Jiang Liu)

  IOMMU
    - Add DMA alias quirk for Intel 82801 bridge (Alex Williamson)

  MSI
    - Add internal msix_clear_and_set_ctrl() (Yijing Wang)
    - Remove unused msi_enabled_mask() (Yijing Wang)
    - Cache Multiple Message Capable in struct msi_desc (Yijing Wang)
    - Add msi_setup_entry() to clean up initialization (Yijing Wang)
    - Remove unused msi_remove_pci_irq_vectors() (Yijing Wang)
    - Retrieve first MSI IRQ from msi_desc rather than pci_dev (Yijing Wang)
    - Remove unused list access in __pci_restore_msix_state() (Yijing Wang)
    - Use irq_get_msi_desc() to simplify code (Yijing Wang)

  Generic host bridge driver
    - Fix GPL v2 license string typo (Bjorn Helgaas)

  Marvell MVEBU
    - Fix GPL v2 license string typo (Thierry Reding)

  NVIDIA Tegra
    - Use correct initial HW settings (Phil Edworthy)
    - Remove rcar_pcie_setup_window() resource argument (Phil Edworthy)
    - Fix GPL v2 license string typo (Thierry Reding)

  Renesas R-Car
    - Remove redundant config accessor register checks (Sergei Shtylyov)
    - Fix GPL v2 license string typo (Bjorn Helgaas)

  Virtualization
    - Factor secondary bus reset logic (Gavin Shan)
    - Remove duplicate powerpc reset logic (Gavin Shan)

  Miscellaneous
    - Rework default VGA detection for EFI (Bruno Prémont)
    - Fix sysfs "acpi_index" and "label" errors for NIC renaming (Simone Gotti)
    - Configure ASPM at pci_enable_device()-time (Vidya Sagar)
    - Add include/linux/pci_ids.h include guard (Rasmus Villemoes)"

* tag 'pci-v3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (38 commits)
  PCI/MSI: Use irq_get_msi_desc() to simplify code
  PCI/MSI: Remove unused list access in __pci_restore_msix_state()
  PCI/MSI: Retrieve first MSI IRQ from msi_desc rather than pci_dev
  PCI/MSI: Remove unused function msi_remove_pci_irq_vectors()
  PCI/MSI: Add msi_setup_entry() to clean up MSI initialization
  PCI: Configure ASPM when enabling device
  x86: don't exclude low BIOS area when allocating address space for non-PCI cards
  PCI: generic: Fix GPL v2 license string typo
  PCI: rcar: Fix GPL v2 license string typo
  PCI: tegra: Fix GPL v2 license string typo
  PCI: mvebu: Fix GPL v2 license string typo
  PCI: Add include guard to include/linux/pci_ids.h
  x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup()
  PCI: Tidy resource assignment messages
  PCI: Return conventional error values from pci_revert_fw_address()
  PCI: Cleanup control flow
  PCI: Support BAR sizes up to 128GB
  PCI: cpqphp: Remove unnecessary null test before debugfs_remove()
  PCI: pciehp: Clear Data Link Layer State Changed during init
  PCI: Add bridge DMA alias quirk for Intel 82801 bridge
  ...
2014-08-04 09:29:37 -07:00
Mark Brown
a6ce305207 Merge remote-tracking branches 'asoc/topic/intel', 'asoc/topic/kirkwood', 'asoc/topic/max98090' and 'asoc/topic/mc13783' into asoc-next 2014-08-04 16:31:45 +01:00
Dave Hansen
d17d8f9ded x86/mm: Add tracepoints for TLB flushes
We don't have any good way to figure out what kinds of flushes
are being attempted.  Right now, we can try to use the vm
counters, but those only tell us what we actually did with the
hardware (one-by-one vs full) and don't tell us what was actually
_requested_.

This allows us to select out "interesting" TLB flushes that we
might want to optimize (like the ranged ones) and ignore the ones
that we have very little control over (the ones at context
switch).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-31 08:48:51 -07:00
Dave Hansen
e9f4e0a9fe x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
I think the flush_tlb_mm_range() code that tries to tune the
flush sizes based on the CPU needs to get ripped out for
several reasons:

1. It is obviously buggy.  It uses mm->total_vm to judge the
   task's footprint in the TLB.  It should certainly be using
   some measure of RSS, *NOT* ->total_vm since only resident
   memory can populate the TLB.
2. Haswell, and several other CPUs are missing from the
   intel_tlb_flushall_shift_set() function.  Thus, it has been
   demonstrated to bitrot quickly in practice.
3. It is plain wrong in my vm:
	[    0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] tlb_flushall_shift: 6
   Which leads to it to never use invlpg.
4. The assumptions about TLB refill costs are wrong:
	http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com
    (more on this in later patches)
5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59
   I believe the sample times were too short.  Running the
   benchmark in a loop yields times that vary quite a bit.

Note that this leaves us with a static ceiling of 1 page.  This
is a conservative, dumb setting, and will be revised in a later
patch.

This also removes the code which attempts to predict whether we
are flushing data or instructions.  We expect instruction flushes
to be relatively rare and not worth tuning for explicitly.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-31 08:48:50 -07:00
David Rientjes
2f078b9cb8 x86, apic: Remove enable_apic_mode callback
The enable_apic_mode() apic callback is never called, so remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302352320.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:44 -07:00
David Rientjes
11a8318ef5 x86, apic: Remove setup_portio_remap callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the setup_portio_remap() apic callback has been obsolete.  Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351480.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:44 -07:00
David Rientjes
e76661ba09 x86, apic: Remove multi_timer_check callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the multi_timer_check() apic callback has been obsolete.  Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351120.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:43 -07:00
David Rientjes
658ffd7e6f x86, apic: Remove check_apicid_present callback
The check_apicid_present() apic callback is never called, so remove it
and functions that implement it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302350160.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:42 -07:00
David Rientjes
c460b5d340 x86, apic: Remove mps_oem_check callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the mps_oem_check() apic callback has been obsolete.  Remove it.

This allows generic_mps_oem_check() to be removed as well.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349390.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:42 -07:00
David Rientjes
300eddf967 x86, apic: Remove smp_callin_clear_local_apic callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the smp_callin_clear_local_apic() apic callback has been obsolete.
Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349040.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:41 -07:00
David Rientjes
6ab1b27c84 x86, apic: Replace trampoline physical addresses with defaults
The trampoline_phys_{high,low} members of struct apic are always
initialized to DEFAULT_TRAMPOLINE_PHYS_HIGH and TRAMPOLINE_PHYS_LOW,
respectively.  Hardwire the constants and remove the unneeded members.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348330.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:41 -07:00
David Rientjes
80a2670379 x86, apic: Remove x86_32_numa_cpu_node callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the x86_32_numa_cpu_node() apic callback has been obsolete.  Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348060.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:40 -07:00
Andy Lutomirski
7209a75d20 x86_64/entry/xen: Do not invoke espfix64 on Xen
This moves the espfix64 logic into native_iret.  To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.

This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).

[ hpa: this is a nonzero cost on native, but probably not enough to
  measure. Xen needs to fix this in their own code, probably doing
  something equivalent to espfix64. ]

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2014-07-28 15:25:40 -07:00
Henrique de Moraes Holschuh
ebc14ddcc9 x86, microcode, intel: Fix total_size computation
According to the Intel SDM vol 3A (order code 253668-051US, June 2014),
on section 9.11.1, page 9-28:

"For microcode updates with a data size field equal to 00000000H, the
size of the microcode update is 2048 bytes. The first 48 bytes contain
the microcode update header. The remaining 2000 bytes contain encrypted
data."

"For microcode updates with a data size not equal to 00000000H, the total
size field specifies the size of the microcode update."

Up to 2002/2003, Intel used an "old format" for the microcode update
containers that was always 2048 bytes in size. That old format did not
have Data Size and Total Size fields, the quadwords at those positions
in the microcode container header were "reserved". The microcode header
of the "old format" microcode container has a hrdver of 0x01. You can
hunt down an old copy of the Intel SDM to validate this through its
order number (#243192). I found one from 1999 through a Google search.

Sometime in 2002/2003 (AFAICT, for the Prescott processors), Intel
documented a new format for the microcode containers and contributed in
2003 some code to the Linux kernel microcode driver implementing support
for the new format. This new format has Data Size and Total Size fields,
as well as the optional extended signature table. However, it reuses the
same hrdver as the old format (0x01), and it can only be told apart from
the old format by a non-zero Data Size field.

In fact, the only reason we can even trust a Data Size of zero to mean
that the microcode container is in the old format, is because Intel
reatroatively promised that the old format would always have a zero
there when they wrote the documentation for the _new_ format.

This is a very old bug, dating back to 2003. It has been dormant
ever since, as Intel seems to set all reserved fields to zero on the
microcode updates they distribute: I could not find a public microcode
update that would trigger this bug.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Link: http://lkml.kernel.org/r/1406146251-8540-1-git-send-email-hmh@hmh.eng.br
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-07-28 16:08:02 +02:00
Ingo Molnar
5030c69755 Linux 3.16-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT1VYNAAoJEHm+PkMAQRiGQJwIAKSYp1Uqz5O/e5r0V1TlZKT4
 1B4Njopl57PwSrJQWcGEuH2yHyM896vfPO4L6BJIOfyWzh8kwpQqclDt6uhXoF/v
 OsO1zb/7/j+n/pDZsePqP9AyIgErsHEBgUbhecDqzjN++ITPcZjQ6TIMPglZaumN
 jFAdAZuAaEwqAk8jqN2wlm689Fh9MuUEarHXbXLCqu5RgLrWhFGhp/cTWY62aqnZ
 XfEeQ9KtpRZmlR/IYjerbb1eRH7ZdJsZ88WngLX9dj/JdNxHWBkWQBXGAusXk5Fk
 y6LsIV3TjyBdrRKJ1Ifyg/2EIXHNBs8HxTFGXpjtp2HPuMLDxZOWOWikb9URtNg=
 =Fjf4
 -----END PGP SIGNATURE-----

Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:00:33 +02:00
Rafael J. Wysocki
8989e1cc35 Merge branch 'acpi-config'
* acpi-config:
  ACPI / processor: Introduce ARCH_MIGHT_HAVE_ACPI_PDC
  ACPI: Don't use acpi_lapic in ACPI core code
  ACPI: add config for BIOS table scan
2014-07-27 23:52:48 +02:00
Li, Aubrey
f855911c1f x86/pmc_atom: Expose PMC device state and platform sleep state
Add the following interfaces to exposes PMC device state and sleep
state residency via debugfs:
	/sys/kernel/debugfs/pmc_atom/dev_state
	/sys/kernel/debugfs/pmc_atom/sleep_state

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/53B0FF59.8000600@linux.intel.com
Signed-off-by: Kasagar, Srinidhi <srinidhi.kasagar@intel.com>
Reviewed-by: Rudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com>
Reviewed-by: Joe Perches <joe@perches.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-25 14:12:14 -07:00
Li, Aubrey
b00055cade x86/pmc_atom: Eisable a few S0ix wake up events for S0ix residency
Disable PMC S0IX_WAKE_EN events coming from LPC block(unused) and
also from GPIO_SUS ored dedicated IRQs (must be disabled as per PMC
programming rule), GPIOSCORE ored dedicated IRQs (must be disabled
as per PMC programming rule), GPIO_SUS shared IRQ (not necessary
since the IOAPIC_DS wake event will still work), GPIO_SCORE shared
IRQ (not necessary since the IOAPIC_DS wake event will still work).

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/53B0FF22.5080403@linux.intel.com
Signed-off-by: Olivier Leveque <olivier.leveque@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-25 14:11:58 -07:00
Li, Aubrey
93e5eadd1f x86/platform: New Intel Atom SOC power management controller driver
The Power Management Controller (PMC) controls many of the power
management features present in the Atom SoC. This driver provides
a native power off function via PMC PCI IO port.

On some ACPI hardware-reduced platforms(e.g. ASUS-T100), ACPI sleep
registers are not valid so that (*pm_power_off)() is not hooked by
acpi_power_off(). The power off function in this driver is installed
only when pm_power_off is NULL.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/53B0FEEA.3010805@linux.intel.com
Signed-off-by: Lejun Zhu <lejun.zhu@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-25 14:11:29 -07:00
Lv Zheng
d334c823b2 ACPICA: Linux: Add support to exclude <asm/acenv.h> inclusion.
The forthcoming patch will make <acpi/acpi.h> to be visible to all kernel
source code. Thus for the architectures that do not support ACPI and
haven't implemented <asm/acenv.h>, we need to make it excluded.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-23 01:10:44 +02:00
Nadav Amit
6f43ed01e8 KVM: x86: DR6/7.RTM cannot be written
Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 17:17:52 +02:00
Nadav Amit
c9cdd085bb KVM: x86: Defining missing x86 vectors
Defining XE, XM and VE vector numbers.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 14:18:51 +02:00
Graeme Gregory
b50154d53e ACPI: Don't use acpi_lapic in ACPI core code
Now ARM64 support is being added to ACPI so architecture specific
values can not be used in core ACPI code.

Following on the patch "ACPI / processor: Check if LAPIC is present
during initialization" which uses acpi_lapic in acpi_processor.c,
on ARM64 platform, GIC is used instead of local APIC, so acpi_lapic
is not a suitable value for ARM64.

What is actually important at this point is if there is/are CPU
entry/entries (Local APIC/SAPIC, GICC) in MADT, so introduce
acpi_has_cpu_in_madt() to be arch specific and generic.

Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21 13:50:58 +02:00
Matt Fleming
44be28e9dd x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag
It appears that the BayTrail-T class of hardware requires EFI in order
to powerdown and reboot and no other reliable method exists.

This quirk is generally applicable to all hardware that has the ACPI
Hardware Reduced bit set, since usually ACPI would be the preferred
method.

Cc: Len Brown <len.brown@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:52 +01:00
Steven Rostedt (Red Hat)
1026ff9b8e ftrace/x86: Have function graph tracer use its own trampoline
The function graph trampoline is called from the function trampoline
and both do a save and restore of registers. The save of registers
done by the function trampoline when only the function graph tracer
is running is a waste of CPU cycles.

As the function graph tracer trampoline in x86 is dependent from
the function trampoline, we can call it directly when a function
is only being traced by the function graph trampoline.

Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-17 09:44:37 -04:00
Davidlohr Bueso
3a6bfbc91d arch, locking: Ciao arch_mutex_cpu_relax()
The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.

This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharat Bhushan <r65777@freescale.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Stratos Karafotis <stratosk@semaphore.gr>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-17 12:32:47 +02:00
Ingo Molnar
b5e4111f02 Merge branch 'locking/urgent' into locking/core, before applying larger changes and to refresh the branch with fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-17 11:45:29 +02:00
Alexander Yarygin
44b3802122 perf kvm: Use defines of kvm events
Currently perf-kvm uses string literals for kvm event names, but it
works only for x86, because other architectures may have other names for
those events.

To reduce dependence on architecture, we add <asm/kvm_perf.h> file with
defines for:

- kvm_entry and kvm_exit events,
- exit reason field name in kvm_exit event,
- length of exit reasons strings,
- vcpu_id field name in kvm trace events,

and replace literals in perf-kvm.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1404397747-20939-2-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-07-16 17:57:32 -03:00
Borislav Petkov
af0fa6f6b5 x86, cpu: Kill cpu_has_mp
It was used only for checking for some K7s which didn't have MP support,
see

http://www.hardwaresecrets.com/article/How-to-Transform-an-Athlon-XP-into-an-Athlon-MP/24

and it was unconditionally set on 64-bit for no reason. Kill it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403609105-8332-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-14 12:21:40 -07:00
Borislav Petkov
80a208bd39 x86/cpufeature: Add bug flags to /proc/cpuinfo
Dump the flags which denote we have detected and/or have applied bug
workarounds to the CPU we're executing on, in a similar manner to the
feature flags.

The advantage is that those are not accumulating over time like the CPU
features.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403609105-8332-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-14 12:21:39 -07:00
Oren Twaig
411cf9ee29 x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box()
When a vSMP Foundation box is detected, the function apic_cluster_num() counts
the number of APIC clusters found. If more than one found, a multi board
configuration is assumed, and TSC marked as unstable. This behavior is
incorrect as vSMP Foundation may use processors from single node only, attached
to memory of other nodes - and such node may have more than one APIC cluster
(typically any recent intel box has more than single APIC_CLUSTERID(x)).

To fix this, we simply remove the code which detects a vSMP Foundation box and
affects apic_is_clusted_box() return value. This can be done because later the
kernel checks by itself if the TSC is stable using the
check_tsc_sync_[source|target]() functions and marks TSC as unstable if needed.

Acked-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1404036068-11674-1-git-send-email-oren@scalemp.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-13 17:48:03 -07:00
Borislav Petkov
b08ee5f7e4 x86: Simplify __HAVE_ARCH_CMPXCHG tests
Both the 32-bit and 64-bit cmpxchg.h header define __HAVE_ARCH_CMPXCHG
and there's ifdeffery which checks it. But since both bitness define it,
we can just as well move it up to the main cmpxchg header and simpify a
bit of code in doing that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140711104338.GB17083@pd.tnic
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-11 17:28:51 -07:00
Andy Lutomirski
e6577a7ce9 x86, vdso: Move the vvar area before the vdso text
Putting the vvar area after the vdso text is rather complicated: it
only works of the total length of the vdso text mapping is known at
vdso link time, and the linker doesn't allow symbol addresses to
depend on the sizes of non-allocatable data after the PT_LOAD
segment.

Moving the vvar area before the vdso text will allow is to safely
map non-allocatable data after the vdso text, which is a nice
simplification.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/156c78c0d93144ff1055a66493783b9e56813983.1405040914.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-11 16:57:51 -07:00
Paolo Bonzini
17052f16a5 KVM: emulate: put pointers in the fetch_cache
This simplifies the code a bit, especially the overflow checks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:03 +02:00
Bandan Das
41061cdb98 KVM: emulate: do not initialize memopp
rip_relative is only set if decode_modrm runs, and if you have ModRM
you will also have a memopp.  We can then access memopp unconditionally.
Note that rip_relative cannot be hoisted up to decode_modrm, or you
break "mov $0, xyz(%rip)".

Also, move typecast on "out of range value" of mem.ea to decode_modrm.

Together, all these optimizations save about 50 cycles on each emulated
instructions (4-6%).

Signed-off-by: Bandan Das <bsd@redhat.com>
[Fix immediate operands with rip-relative addressing. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:01 +02:00
Bandan Das
573e80fe04 KVM: emulate: rework seg_override
x86_decode_insn already sets a default for seg_override,
so remove it from the zeroed area. Also replace set/get functions
with direct access to the field.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:01 +02:00
Bandan Das
c44b4c6ab8 KVM: emulate: clean up initializations in init_decode_cache
A lot of initializations are unnecessary as they get set to
appropriate values before actually being used. Optimize
placement of fields in x86_emulate_ctxt

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:00 +02:00
Bandan Das
1498507a47 KVM: emulate: move init_decode_cache to emulate.c
Core emulator functions all belong in emulator.c,
x86 should have no knowledge of emulator internals

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:59 +02:00
Paolo Bonzini
54cfdb3e95 KVM: emulate: speed up emulated moves
We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about
200 cycles on kvm-unit-tests for register-to-register moves, and makes
them about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:58 +02:00
Paolo Bonzini
37ccdcbe07 KVM: x86: return all bits from get_interrupt_shadow
For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.

However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.

Clean the callback up, and modify toggle_interruptibility to
match the comment above the call.  As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:56 +02:00
Bruno Prémont
20cde69402 x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup()
Commit b4aa016305 ("efifb: Implement vga_default_device() (v2)") added
efifb vga_default_device() so EFI systems that do not load shadow VBIOS or
setup VGA get proper value for boot_vga PCI sysfs attribute on the
corresponding PCI device.

Xorg doesn't detect devices when boot_vga=0, e.g., on some EFI systems such
as MacBookAir2,1.  Xorg detects the GPU and finds the DRI device but then
bails out with "no devices detected".

Note: When vga_default_device() is set boot_vga PCI sysfs attribute
reflects its state.  When unset this attribute is 1 whenever
IORESOURCE_ROM_SHADOW flag is set.

With introduction of sysfb/simplefb/simpledrm efifb is getting obsolete
while having native drivers for the GPU also makes selecting sysfb/efifb
optional.

Remove the efifb implementation of vga_default_device() and initialize
vgaarb's vga_default_device() with the PCI GPU that matches boot
screen_info in pci_fixup_video().

[bhelgaas: remove unused "dev" in efifb_setup()]
Fixes: b4aa016305 ("efifb: Implement vga_default_device() (v2)")
Tested-by: Anibal Francisco Martinez Cortina <linuxkid.zeuz@gmail.com>
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Matthew Garrett <matthew.garrett@nebula.com>
CC: stable@vger.kernel.org	# v3.5+
2014-07-10 16:48:48 -06:00
Tomasz Grabiec
0d3da0d26e KVM: x86: fix TSC matching
I've observed kvmclock being marked as unstable on a modern
single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.

The culprit was failure in TSC matching because of overflow of
kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
in a single synchronization cycle.

Turns out that qemu does multiple TSC writes during init, below is the
evidence of that (qemu-2.0.0):

The first one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]

The second one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
 #4  cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
 #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
 #6  main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390

The third one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
 #4  cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
 #5  cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
 #6  main  at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482

The fix is to count each vCPU only once when matched, so that
nr_vcpus_matched_tsc holds the size of the matched set. This is
achieved by reusing generation counters. Every vCPU with
this_tsc_generation == cur_tsc_generation is in the matched set. The
match set is cleared by setting cur_tsc_generation to a value which no
other vCPU is set to (by incrementing it).

I needed to bump up the counter size form u8 to u64 to ensure it never
overflows. Otherwise in cases TSC is not written the same number of
times on each vCPU the counter could overflow and incorrectly indicate
some vCPUs as being in the matched set. This scenario seems unlikely
but I'm not sure if it can be disregarded.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:57 +02:00
Jan Kiszka
6cbc5f5a80 KVM: nSVM: Set correct port for IOIO interception evaluation
Obtaining the port number from DX is bogus as a) there are immediate
port accesses and b) user space may have changed the register content
while processing the PIO access. Forward the correct value from the
instruction emulator instead.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:56 +02:00
Ard Biesheuvel
f23cf8bd5c efi/x86: efistub: Move shared dependencies to <asm/efi.h>
This moves definitions depended upon both by code under arch/x86/boot
and under drivers/firmware/efi to <asm/efi.h>. This is in preparation of
turning the stub code under drivers/firmware/efi into a static library.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-07 20:29:46 +01:00
Tejun Heo
b9cd18de4d ptrace,x86: force IRET path after a ptrace_stop()
The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values.  That is
very much part of why it's faster than 'iret'.

Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.

However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'.  Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.

Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03 17:27:23 -07:00
Linus Torvalds
4f23174981 A bunch of one-liners (except the s390 one).
The two more serious bugs ("KVM: SVM: Fix CPL export via SS.DPL" and
 "KVM: s390: add sie.h uapi header file to Kbuild and remove header
 dependency") were introduced in the 3.16 merge window.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTso6OAAoJEBvWZb6bTYbyB5IP/j/1d0hKsVjOGMco+dJ3uwjh
 X2gYBVxT3Nm9fyjAjSM6OjbFF2mj9zFNEGu0NvEaDnIlWtifvtXckFB1asp/o3/M
 UTbeaaN5US9Ou9W87KpJQLp7Wp9ENMgXeFsywpf9qMNyV04OHSP5cCwIspShCkNH
 r21oEwgvrnc4Trh4oBVUaykuPuU4mzAMBSiXxbQWVXkkPYBVBjGYNzdas7K3EfS9
 YmY1NgS1HDrUvRuM0b3guHqEizA717xFxpVpXAYhuxRqb1fRWMDuIy7q21hEoXxE
 RtuFjztfpAWIgK9O2j4mTuqR32nedzsieVMzF486oMPXRrUs+oQ16/AYV7K5eOZF
 Q1QJ5zx7890ncfxjXYMdUTI7d5sDFCc7F3DmRwtWh1jYrsNh8p+VRDTbNdmiNuIa
 1wXkoNpTsFHidXA6Uhl2pfI+o9OOWleEP3bB746RanS4bk54cDrx6UK2gJK6MMHl
 bekjzQrXRlh3qN1mqS+nMShq/vd2G6cCG0Y9ez8/aHrJoU5DPIOQ7IcOt2IZGtwZ
 MiBZAoWgHuYpEV4tXzqjHQy8IAddvGnM3RqWfNc0XLlVwcKosdI4fusg8wKnR02Z
 sLYRfe5BTnHG/ieIlx/iQmRXN04hhIUJiggFZrLizGemGYi16SaN/ixwb4YoA3nH
 ksZxlGKUi9SUftnv0Ph6
 =Rjb4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "A bunch of one-liners (except the s390 one).

  The two more serious bugs ("KVM: SVM: Fix CPL export via SS.DPL" and
  "KVM: s390: add sie.h uapi header file to Kbuild and remove header
  dependency") were introduced in the 3.16 merge window"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Fix CPL export via SS.DPL
  KVM: s390: add sie.h uapi header file to Kbuild and remove header dependency
  MIPS: KVM: Fix memory leak on VCPU
  KVM: x86: preserve the high 32-bits of the PAT register
  kvm: fix wrong address when writing Hyper-V tsc page
  KVM: x86: Increase the number of fixed MTRR regs to 10
2014-07-01 09:27:34 -07:00
Aaron Tomlin
f3aca3d095 nmi: provide the option to issue an NMI back trace to every cpu but current
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current.  For
instance if one was previously captured recently.

This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.

Patch x86 and sparc to support new routine.

[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23 16:47:44 -07:00