Commit graph

61159 commits

Author SHA1 Message Date
Jens Axboe
bcd4f3acba splice: direct splicing updates ppos twice
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> reported that he's noticed
nfsd read corruption in recent kernels, and did the hard work of
discovering that it's due to splice updating the file position twice.
This means that the next operation would start further ahead than it
should.

nfsd_vfs_read()
    splice_direct_to_actor()
        while(len) {
            do_splice_to()                     [update sd->pos]
                -> generic_file_splice_read()  [read from sd->pos]
            nfsd_direct_splice_actor()
                -> __splice_from_pipe()        [update sd->pos]

There's nothing wrong with the core splice code, but the direct
splicing is an addon that calls both input and output paths.
So it has to take care in locally caching offset so it remains correct.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-16 15:02:48 +02:00
Adrian Bunk
56a68a500f more ACSI removal
This patch removes some code that became dead code after the ATARI_ACSI
removal.

It also indirectly fixes the following bug introduced by
commit c2bcf3b897:

 config ATARI_SLM
        tristate "Atari SLM laser printer support"
-       depends on ATARI && ATARI_ACSI!=n
+       depends on ATARI

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-16 15:02:47 +02:00
Neil Brown
5874c18b10 umem: Fix match of pci_ids in umem driver
the pci device list for umem was not using PCI_DEVICE, so the
subvendor/subdevice fields were not set to ANY, so matching
didn't work properly.

Change to use PCI_DEVICE.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-16 14:39:07 +02:00
Robert P. J. Day
51ea208c37 umem: Remove references to dead CONFIG_MM_MAP_MEMORY variable
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-16 14:39:06 +02:00
Adrian Bunk
a3e4da5483 remove the documentation for the legacy CDROM drivers
This patch removes the documentation for the removed legacy CDROM drivers.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-16 14:39:04 +02:00
David S. Miller
d54bc2793e [SPARC64]: Fix UP build.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:41:51 -07:00
David S. Miller
e0204409df [SPARC64]: dr-cpu unconfigure support.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:32 -07:00
David S. Miller
f3c681c028 [SERIAL]: Fix console write locking in sparc drivers.
Mirror the logic in 8250 for proper console write locking
when SYSRQ is triggered or an OOPS is in progress.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:28 -07:00
David S. Miller
9918cc2e32 [SPARC64]: Give more accurate errors in dr_cpu_configure().
When cpu_up() fails, we can discern the most likely cause.

If cpu_present() is false, this means the cpu did not appear
in the MD.  If -ENODEV is the error return value, then
the processor did not boot properly into the kernel.

Pass this information back in the dr-cpu response packet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:24 -07:00
David S. Miller
39dd992aee [SPARC64]: Clear cpu_{core,sibling}_map[] in smp_fill_in_sib_core_maps()
When we hot-plug in new cpus, the core_id and proc_id of existing
cpus can change.  So in order to set the cpu groups correctly we
need to clear the maps out completely first.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:19 -07:00
David S. Miller
b37d40d175 [SPARC64]: Fix leak when DR added cpu does not bootup.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:15 -07:00
David S. Miller
b53bcb6799 [SPARC64]: Add ->set_affinity IRQ handlers.
dr-cpu unconfigure requests will walk throught he enabled
IRQs and trigger ->set_affinity so that the going-down
cpu no longer has INOs targetted to it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:11 -07:00
David S. Miller
bd0e11ff22 [SPARC64]: Process dr-cpu events in a kthread instead of workqueue.
This will be necessary to handle unconfigure requests
properly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:07 -07:00
David S. Miller
8b99cfb8cc [SPARC64]: More sensible udelay implementation.
Take a page from the powerpc folks and just calculate the
delay factor directly.

Since frequency scaling chips use a system-tick register,
the value is going to be the same system-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:05:02 -07:00
David S. Miller
27a2ef382c [SPARC64]: SMP build fixes.
With the move of ldom_startcpu_cpuid() into smp.c some other
things need to follow along:

1) smp.c is not a driver so we can't use "PFX" macro in the
   printk calls.

2) smp.c now needs asm/io.h and asm/hvtramp.h, ds.c no longer
   does

3) kimage_addr_to_ra() also needs to move into smp.c

While we're here, update copyright info and my email address
in smp.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:58 -07:00
David S. Miller
8f3fff2050 [SPARC64]: mdesc.c needs linux/mm.h
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:53 -07:00
David S. Miller
b14f5c100c [SPARC64]: Fix build regressions added by dr-cpu changes.
Do not select HOTPLUG_CPU from SUN_LDOMS, that causes
HOTPLUG_CPU to be selected even on non-SMP which is
illegal.

Only build hvtramp.o when SMP, just like trampoline.o

Protect dr-cpu code in ds.c with HOTPLUG_CPU.

Likewise move ldom_startcpu_cpuid() to smp.c and protect
it and the call site with SUN_LDOMS && HOTPLUG_CPU.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:49 -07:00
David S. Miller
f8be339c02 [SPARC64]: Unconditionally register vio_bus_type.
The VIO drivers register themselves unconditionally just
like those of any other bus type, so to avoid crashes
on non-VIO systems we need to always register vio_bus_type.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:45 -07:00
David S. Miller
4f0234f4f9 [SPARC64]: Initial LDOM cpu hotplug support.
Only adding cpus is supports at the moment, removal
will come next.

When new cpus are configured, the machine description is
updated.  When we get the configure request we pass in a
cpu mask of to-be-added cpus to the mdesc CPU node parser
so it only fetches information for those cpus.  That code
also proceeds to update the SMT/multi-core scheduling bitmaps.

cpu_up() does all the work and we return the status back
over the DS channel.

CPUs via dr-cpu need to be booted straight out of the
hypervisor, and this requires:

1) A new trampoline mechanism.  CPUs are booted straight
   out of the hypervisor with MMU disabled and running in
   physical addresses with no mappings installed in the TLB.

   The new hvtramp.S code sets up the critical cpu state,
   installs the locked TLB mappings for the kernel, and
   turns the MMU on.  It then proceeds to follow the logic
   of the existing trampoline.S SMP cpu bringup code.

2) All calls into OBP have to be disallowed when domaining
   is enabled.  Since cpus boot straight into the kernel from
   the hypervisor, OBP has no state about that cpu and therefore
   cannot handle being invoked on that cpu.

   Luckily it's only a handful of interfaces which can be called
   after the OBP device tree is obtained.  For example, rebooting,
   halting, powering-off, and setting options node variables.

CPU removal support will require some infrastructure changes
here.  Namely we'll have to process the requests via a true
kernel thread instead of in a workqueue.  workqueues run on
a per-cpu thread, but when unconfiguring we might need to
force the thread to execute on another cpu if the current cpu
is the one being removed.  Removal of a cpu also causes the kernel
to destroy that cpu's workqueue running thread.

Another issue on removal is that we may have interrupts still
pointing to the cpu-to-be-removed.  So new code will be needed
to walk the active INO list and retarget those cpus as-needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:40 -07:00
David S. Miller
b3e13fbeb9 [SPARC64]: Fix setting of variables in LDOM guest.
There is a special domain services capability for setting
variables in the OBP options node.  Guests don't have permanent
store for the OBP variables like a normal system, so they are
instead maintained in the LDOM control node or in the SC.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:36 -07:00
David S. Miller
83292e0a9c [SPARC64]: Fix MD property lifetime bugs.
Property values cannot be referenced outside of
mdesc_grab()/mdesc_release() pairs.  The only major
offender was the VIO bus layer, easily fixed.

Add some commentary to mdesc.h describing these rules.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:33 -07:00
David S. Miller
43fdf27470 [SPARC64]: Abstract out mdesc accesses for better MD update handling.
Since we have to be able to handle MD updates, having an in-tree
set of data structures representing the MD objects actually makes
things more painful.

The MD itself is easy to parse, and we can implement the existing
interfaces using direct parsing of the MD binary image.

The MD is now reference counted, so accesses have to now take the
form:

	handle = mdesc_grab();

	... operations on MD ...

	mdesc_release(handle);

The only remaining issue are cases where code holds on to references
to MD property values.  mdesc_get_property() returns a direct pointer
to the property value, most cases just pull in the information they
need and discard the pointer, but there are few that use the pointer
directly over a long lifetime.  Those will be fixed up in a subsequent
changeset.

A preliminary handler for MD update events from domain services is
there, it is rudimentry but it works and handles all of the reference
counting.  It does not check the generation number of the MDs,
and it does not generate a "add/delete" list for notification to
interesting parties about MD changes but that will be forthcoming.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:28 -07:00
David S. Miller
133f09a169 [SPARC64]: Use more mearningful names for IRQ registry.
All of the interrupts say "LDX RX" and "LDX TX" currently
which is next to useless.  Put a device specific prefix
before "RX" and "TX" instead which makes it much more
useful.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:24 -07:00
David S. Miller
e450992d13 [SPARC64]: Initial domain-services driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:20 -07:00
David S. Miller
13077d8028 [SPARC64]: Export powerd facilities for external entities.
Besides the existing usage for power-button interrupts, we'll
want to make use of this code for domain-services where the
LDOM manager can send reboot requests to the guest node.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:16 -07:00
David S. Miller
2c4f4ecb7a [SPARC64]: Add domain-services nodes to VIO device tree.
They sit under the root of the MD tree unlike the rest of
the LDC channel based virtual devices.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:13 -07:00
David S. Miller
cb48123584 [SPARC64]: Assorted LDC bug cures.
1) LDC_MODE_RELIABLE is deprecated an unused by anything, plus
   it and LDC_MODE_STREAM were mis-numbered.

2) read_stream() should try to read as much as possible into
   the per-LDC stream buffer area, so do not trim the read_nonraw()
   length by the caller's size parameter.

3) Send data ACKs when necessary in read_nonraw().

4) In read_nonraw() when we get a pure ACK, advance the RX head
   unconditionally past it.

5) Provide the ACKID field in the ldcdgb() packet dump in read_nonraw().
   This helps debugging stream mode LDC channel problems.

6) Decrease verbosity of rx_data_wait() so that it is more useful.
   A debugging message each loop iteration is too much.

7) In process_data_ack() stop the loop checking when we hit lp->tx_tail
   not lp->tx_head.

8) Set the seqid field properly in send_data_nack().

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:09 -07:00
David S. Miller
5a606b72a4 [SPARC64]: Do not ACK an INO if it is disabled or inprogress.
This is also a partial workaround for a bug in the LDOM firmware which
double-transmits RX inos during high load.  Without this, such an
event causes the kernel to loop forever in the interrupt call chain
ACK'ing but never actually running the IRQ handler (and thus clearing
the interrupt condition in the device).

There is still a bad potential effect when double INOs occur,
not covered by this changeset.  Namely, if the INO is already on
the per-cpu INO vector list, we still blindly re-insert it and
thus we can end up losing interrupts already linked in after
it.

We could deal with that by traversing the list before insertion,
but that's too expensive for this edge case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:04:05 -07:00
David S. Miller
667ef3c396 [SPARC64]: Add Sun LDOM virtual disk driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:03:56 -07:00
David S. Miller
4c521e422f [SPARC64]: Add Sun LDOM virtual network driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:03:23 -07:00
David S. Miller
e53e97ce3c [SPARC64]: Add LDOM virtual channel driver and VIO device layer.
Virtual devices on Sun Logical Domains are built on top
of a virtual channel framework.  This, with help of hypervisor
interfaces, provides a link layer protocol with basic
handshaking over which virtual device clients and servers
communicate.

Built on top of this is a VIO device protocol which has it's
own handshaking and message types.  At this layer attributes
are exchanged (disk size, network device addresses, etc.)
descriptor rings are registered, and data transfers are
triggers and replied to.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16 04:03:18 -07:00
Avi Kivity
cec9ad279b KVM: Use CPU_DYING for disabling virtualization
Only at the CPU_DYING stage can we be sure that no user process will
be scheduled onto the cpu and oops when trying to use virtualization
extensions.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:51 +03:00
Avi Kivity
4267c41a45 KVM: Tune hotplug/suspend IPIs
The hotplug IPIs can be called from the cpu on which we are currently
running on, so use on_cpu().  Similarly, drop on_each_cpu() for the
suspend/resume callbacks, as we're in atomic context here and only one
cpu is up anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:51 +03:00
Avi Kivity
1b6c016818 KVM: Keep track of which cpus have virtualization enabled
By keeping track of which cpus have virtualization enabled, we
prevent double-enable or double-disable during hotplug, which is a
very fatal oops.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:51 +03:00
Avi Kivity
a52b1752c0 SMP: Allow smp_call_function_single() to current cpu
This removes the requirement for callers to get_cpu() to check in simple
cases.  This patch is for !CONFIG_SMP.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:50 +03:00
Avi Kivity
de48935391 i386: Allow smp_call_function_single() to current cpu
This removes the requirement for callers to get_cpu() to check in simple
cases.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:50 +03:00
Avi Kivity
4055551bbc x86_64: Allow smp_call_function_single() to current cpu
This removes the requirement for callers to get_cpu() to check in simple
cases.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:50 +03:00
Avi Kivity
38ef6d195f HOTPLUG: Adapt thermal throttle to CPU_DYING
CPU_DYING is notified in atomic context, so no taking mutexes here.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:50 +03:00
Avi Kivity
ac076758b9 HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING
CPU_DYING is called in atomic context, so don't try to take any locks.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Avi Kivity
db912f9639 HOTPLUG: Add CPU_DYING notifier
KVM wants a notification when a cpu is about to die, so it can disable
hardware extensions, but at a time when user processes cannot be scheduled
on the cpu, so it doesn't try to use virtualization extensions after they
have been disabled.

This adds a CPU_DYING notification.  The notification is called in atomic
context on the doomed cpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Avi Kivity
e495606dd0 KVM: Clean up #includes
Remove unnecessary ones, and rearange the remaining in the standard order.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Avi Kivity
d6d2816849 KVM: Remove kvmfs in favor of the anonymous inodes source
kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the
new anonymous inodes source does much better.

Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Joerg Roedel
6031a61c2e KVM: SVM: Reliably detect if SVM was disabled by BIOS
This patch adds an implementation to the svm is_disabled function to
detect reliably if the BIOS disabled the SVM feature in the CPU. This
fixes the issues with kernel panics when loading the kvm-amd module on
machines where SVM is available but disabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Avi Kivity
796fd1b23e KVM: VMX: Remove unnecessary code in vmx_tlb_flush()
A vmexit implicitly flushes the tlb; the code is bogus.

Noted by Shaohua Li.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Shaohua Li
88a97f0b2f KVM: MMU: Fix Wrong tlb flush order
Need to flush the tlb after updating a pte, not before.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Avi Kivity
75880a0112 KVM: VMX: Reinitialize the real-mode tss when entering real mode
Protected mode code may have corrupted the real-mode tss, so re-initialize
it when switching to real mode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Luca Tettamanti
a3c870bdce KVM: Avoid useless memory write when possible
When writing to normal memory and the memory area is unchanged the write
can be safely skipped, avoiding the costly kvm_mmu_pte_write.

Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Luca Tettamanti
02c03a326a KVM: Fix x86 emulator writeback
When the old value and new one are the same the emulator skips the
write; this is undesirable when the destination is a MMIO area and the
write shall be performed regardless of the previous value. This
optimization breaks e.g. a Linux guest APIC compiled without
X86_GOOD_APIC.

Remove the check and perform the writeback stage in the emulation unless
it's explicitly disabled (currently push and some 2 bytes instructions
may disable the writeback).

Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Eddie Dong
74906345ff KVM: Add support for in-kernel pio handlers
Useful for the PIC and PIT.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Gregory Haskins
ff1dc7942b KVM: VMX: Fix interrupt checking on lightweight exit
With kernel-injected interrupts, we need to check for interrupts on
lightweight exits too.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00